CareMax OCR
/install caremax-ocr
CareMax Upload & OCR
Requires
caremax-authas a sibling directory (../caremax-auth/). If missing, tell the user to install caremax-auth first (e.g.npx skills add KittenYang/caremax-skills).
Upload medical report files (PDF, JPG, PNG, HEIC) and extract structured data via AI-powered OCR.
Session-based workflow: upload → OCR → review → confirm. All operations are on a single session.
Checkpoint & resume: Every pipeline step saves progress to the database. If OCR fails mid-way (LLM timeout, worker crash, network error), retrying automatically resumes from the last checkpoint — no work is lost.
Agent default behavior (MANDATORY)
- Upload and OCR are one continuous workflow. When the user uploads report files (or asks you to upload/扫描/识别体检报告等), after
$UPLOADreturns successfully you must in the same turn run$OCRSTREAM \x3Csession_id>using the returnedsession_id. Do not end the task afterupload.shalone. - Upload-only exception: Skip immediate OCR only if the user explicitly asked to upload without recognition (e.g. 只上传、不要识别、别跑 OCR、只存文件). If unclear, default to running OCR after upload.
- Progress: Stream each SSE line to the user as it arrives (normalize / ocr / structure / …).
- After
step=done: Always continue to Step 3 (review). Do not auto-call confirm — wait for user approval before Step 4.
Prerequisites — Auto-Auth (MANDATORY)
APICALL="bash ../caremax-auth/scripts/api-call.sh"
UPLOAD="bash ../caremax-auth/scripts/upload.sh"
OCRSTREAM="bash ../caremax-auth/scripts/ocr-stream.sh"
If any script returns no_credentials → run bash ../caremax-auth/scripts/auth-flow.sh [base_url] (from this skill’s root, sibling of caremax-auth/).
Step 1: Upload (creates session)
$UPLOAD /path/to/report1.jpg /path/to/report2.jpg /path/to/report.pdf
Returns:
{
"session_id": "uuid-xxx",
"member_id": "uuid-yyy",
"files": [
{ "id": "file-1", "original_name": "report1.jpg" },
{ "id": "file-2", "original_name": "report2.jpg" },
{ "id": "file-3", "original_name": "report.pdf" }
]
}
Save the session_id.
Step 2: OCR with real-time progress
$OCRSTREAM \x3Csession_id>
Outputs one JSON per line:
{"step":"resume","progress":1,"message":"Resuming from checkpoint (last completed: ocr)..."}
{"step":"normalize","progress":5,"message":"Loading file 1/3..."}
{"step":"ocr","progress":30,"message":"OCR page 2/3: report2.jpg"}
{"step":"ocr_retry","progress":35,"message":"Retrying OCR page 1/1: report1.jpg"}
{"step":"structure","progress":62,"message":"Detecting report groups..."}
{"step":"structure","progress":75,"message":"Structuring report 2/2..."}
{"step":"normalize_indicators","progress":88,"message":"Standardizing..."}
{"step":"done","progress":100,"data":{"session_id":"...","reports":[...],"resumed":true}}
Display progress to the user as each line arrives.
Key progress events
| step | meaning |
|---|---|
resume |
Pipeline is resuming from a saved checkpoint (not starting from zero) |
info |
Informational message (e.g. which step was resumed from) |
normalize |
Loading and preprocessing files |
ocr |
OCR text extraction per page |
ocr_retry |
Retrying previously failed pages only |
structure |
AI analyzing and grouping reports |
normalize_indicators |
Standardizing indicator names |
done |
Complete — data field contains the full results |
error |
Pipeline failed — check message for details |
If step=resume appears, tell the user: "正在从上次的进度继续处理(不需要重新开始)"
Error responses from $OCRSTREAM
| code | meaning | action |
|---|---|---|
processing_in_progress |
Another OCR run is still active | Wait and retry, or poll /status |
ocr_limit_exceeded |
Free OCR quota exhausted | Tell user to upgrade |
| (no code) | Pipeline error (LLM timeout etc.) | Retry — will auto-resume from checkpoint |
Step 2b: Poll status (when SSE disconnects)
If the SSE stream disconnects (network timeout, terminal closed), use the status endpoint to check progress:
$APICALL GET "/api/skill/sessions/\x3Csession_id>/status"
Returns:
{
"session_id": "uuid",
"status": "processing",
"pipeline": {
"completedStep": "ocr",
"pageCount": 5,
"ocrCompleted": 4,
"ocrFailed": 1,
"reportCount": 0,
"errors": [{"step":"ocr","pageIndex":2,"message":"PaddleOCR timeout"}]
},
"error": null,
"is_stale": false
}
Field guide:
status = processing+is_stale = false→ OCR is still running normallystatus = processing+is_stale = true→ Worker crashed/timed out, safe to retry OCRstatus = awaiting_confirm→ OCR completed! Fetch session detail for resultsstatus = uploading+errorpresent → Last OCR attempt failed, retry will resume from checkpointpipeline.completedStep→ How far the pipeline got (normalize → ocr → structure → done)pipeline.ocrFailed→ Number of pages that failed OCR (will be retried on next attempt)
Polling workflow:
1. Call $OCRSTREAM → SSE disconnects mid-way
2. Poll GET /sessions/\x3Cid>/status every 5-10 seconds
3. When status = "awaiting_confirm" → fetch full results with GET /sessions/\x3Cid>
4. If status = "uploading" (failed) → retry with $OCRSTREAM (auto-resumes)
5. If is_stale = true → retry with $OCRSTREAM (auto-resumes from checkpoint)
Step 3: Review results (MANDATORY)
Parse the step=done data. Show formatted summary. Do NOT auto-confirm.
Each report has a reportType field: lab, genetic, imaging, pathology, or other.
Lab reports (reportType = "lab")
Show indicators table:
📋 报告 1: [lab] 尿生化 (编号: 114431194)
日期: 2025-02-05 医生: 俞海瑾
指标: 12 个 (3 个异常)
┌──────────────────────┬────────┬──────────┬────────────┬──────┐
│ 指标 │ 结果 │ 单位 │ 参考范围 │ 异常 │
├──────────────────────┼────────┼──────────┼────────────┼──────┤
│ 24H尿钠 │ 130.0 │ mmol/24h │ 137-257 │ ⬇ │
└──────────────────────┴────────┴──────────┴────────────┴──────┘
Non-lab reports (reportType = "genetic" / "imaging" / etc.)
Show summary + sections:
📋 报告 1: [genetic] 基因检测报告
日期: 2025-09-12 检测机构: 南京申友医学检验所
摘要: 心血管18项基因检测...高血压、冠心病风险一般...
段落: 18 sections
[gene_variant] 高血压 — 风险: 正常
[gene_variant] 冠心病 — 风险: 一般
[medication] ACEI类降压药 — 正常代谢型
...
Supported file types
- Images (JPG/PNG/HEIC): PaddleOCR → structure
- PDF (any size): Azure Mistral Document AI page-split → structure
- Large PDFs (e.g. 23-page gene report, 9.6MB) are fully supported
Step 4: Confirm and save
After user confirms:
$APICALL POST "/api/skill/sessions/\x3Csession_id>/confirm" '{"reports":[\x3Creports from step 2>]}'
Returns: {"success":true,"message":"2 report(s) saved","recordIds":[...]}
Resuming incomplete sessions
When the user asks to continue/resume a previous upload, or when checking for unfinished work:
Step A: Find pending sessions
# List sessions that need OCR (uploaded but not processed)
$APICALL GET "/api/skill/sessions?status=uploading"
# List sessions stuck in processing (user exited mid-OCR)
$APICALL GET "/api/skill/sessions?status=processing"
# List sessions with OCR done but not yet confirmed
$APICALL GET "/api/skill/sessions?status=awaiting_confirm"
Show a summary of pending sessions to the user (file names, dates, status).
Step B: Resume based on status
uploading: Start OCR directly → go to Step 2 ($OCRSTREAM \x3Csession_id>)- If there's a saved checkpoint (previous failed attempt), OCR auto-resumes from it
processing: Check with status endpoint first:$APICALL GET "/api/skill/sessions/\x3Csession_id>/status"is_stale = false→ still running, wait or pollis_stale = true→ worker died, safe to retry:$OCRSTREAM \x3Csession_id>(auto-resumes from checkpoint)
awaiting_confirm: Get session detail → show results → go to Step 3 (review & confirm)
# Get full detail of a pending session (includes OCR results if awaiting_confirm)
$APICALL GET "/api/skill/sessions/\x3Csession_id>"
If the session is awaiting_confirm, the response includes ocr_result with the previously parsed reports — display them for review and proceed to Step 3 (confirm).
Resume-aware response handling
When $OCRSTREAM outputs step=done:
resumed = truein the data → tell user: "已从上次的进度恢复,OCR 结果已就绪"resumed = false(or absent) → normal fresh run
When $OCRSTREAM outputs step=error:
code = processing_in_progress→ tell user OCR is still running, poll/statusinsteadcode = ocr_limit_exceeded→ tell user to upgrade- No code → LLM/network error, safe to retry (will auto-resume from checkpoint)
Step C: Delete individual reports or stale sessions
Delete a single report (does NOT affect other reports in the same session):
$APICALL DELETE "/api/skill/sessions/\x3Csession_id>/records/\x3Crecord_id>"
Delete an entire session (cascade deletes ALL files + reports):
$APICALL DELETE "/api/skill/sessions/\x3Csession_id>"
Other session operations
# List all sessions (all statuses)
$APICALL GET /api/skill/sessions
# List sessions filtered by status: uploading | processing | awaiting_confirm | completed
$APICALL GET "/api/skill/sessions?status=\x3Cstatus>"
# Get session detail (includes OCR results if awaiting_confirm, saved reports if completed)
$APICALL GET "/api/skill/sessions/\x3Csession_id>"
# Poll OCR progress (lightweight, use when SSE disconnects)
$APICALL GET "/api/skill/sessions/\x3Csession_id>/status"
# Delete single report (keeps session and other reports intact)
$APICALL DELETE "/api/skill/sessions/\x3Csession_id>/records/\x3Crecord_id>"
# Delete entire session (undo everything: files + reports)
$APICALL DELETE "/api/skill/sessions/\x3Csession_id>"
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install caremax-ocr - After installation, invoke the skill by name or use
/caremax-ocr - Provide required inputs per the skill's parameter spec and get structured output
What is CareMax OCR?
Upload medical reports and run OCR recognition via CareMax Health API. After upload succeeds, agents MUST immediately run OCR on the same session unless the... It is an AI Agent Skill for Claude Code / OpenClaw, with 93 downloads so far.
How do I install CareMax OCR?
Run "/install caremax-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is CareMax OCR free?
Yes, CareMax OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does CareMax OCR support?
CareMax OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created CareMax OCR?
It is built and maintained by Qitao Yang (@kittenyang); the current version is v1.0.0.