Description

Upload medical reports and run OCR recognition via CareMax Health API. After upload succeeds, agents MUST immediately run OCR on the same session unless the...

README (SKILL.md)

CareMax Upload & OCR

Name: CareMax OCR
Author: kittenyang

Requires caremax-auth as a sibling directory (../caremax-auth/). If missing, tell the user to install caremax-auth first (e.g. npx skills add KittenYang/caremax-skills).

Upload medical report files (PDF, JPG, PNG, HEIC) and extract structured data via AI-powered OCR.

Session-based workflow: upload → OCR → review → confirm. All operations are on a single session.

Checkpoint & resume: Every pipeline step saves progress to the database. If OCR fails mid-way (LLM timeout, worker crash, network error), retrying automatically resumes from the last checkpoint — no work is lost.

Agent default behavior (MANDATORY)

Upload and OCR are one continuous workflow. When the user uploads report files (or asks you to upload/扫描/识别体检报告等), after $UPLOAD returns successfully you must in the same turn run $OCRSTREAM \x3Csession_id> using the returned session_id. Do not end the task after upload.sh alone.
Upload-only exception: Skip immediate OCR only if the user explicitly asked to upload without recognition (e.g. 只上传、不要识别、别跑 OCR、只存文件). If unclear, default to running OCR after upload.
Progress: Stream each SSE line to the user as it arrives (normalize / ocr / structure / …).
After step=done: Always continue to Step 3 (review). Do not auto-call confirm — wait for user approval before Step 4.

Prerequisites — Auto-Auth (MANDATORY)

APICALL="bash ../caremax-auth/scripts/api-call.sh"
UPLOAD="bash ../caremax-auth/scripts/upload.sh"
OCRSTREAM="bash ../caremax-auth/scripts/ocr-stream.sh"

If any script returns no_credentials → run bash ../caremax-auth/scripts/auth-flow.sh [base_url] (from this skill’s root, sibling of caremax-auth/).

Step 1: Upload (creates session)

$UPLOAD /path/to/report1.jpg /path/to/report2.jpg /path/to/report.pdf

Returns:

{
  "session_id": "uuid-xxx",
  "member_id": "uuid-yyy",
  "files": [
    { "id": "file-1", "original_name": "report1.jpg" },
    { "id": "file-2", "original_name": "report2.jpg" },
    { "id": "file-3", "original_name": "report.pdf" }
  ]
}

Save the session_id.

Step 2: OCR with real-time progress

$OCRSTREAM \x3Csession_id>

Outputs one JSON per line:

{"step":"resume","progress":1,"message":"Resuming from checkpoint (last completed: ocr)..."}
{"step":"normalize","progress":5,"message":"Loading file 1/3..."}
{"step":"ocr","progress":30,"message":"OCR page 2/3: report2.jpg"}
{"step":"ocr_retry","progress":35,"message":"Retrying OCR page 1/1: report1.jpg"}
{"step":"structure","progress":62,"message":"Detecting report groups..."}
{"step":"structure","progress":75,"message":"Structuring report 2/2..."}
{"step":"normalize_indicators","progress":88,"message":"Standardizing..."}
{"step":"done","progress":100,"data":{"session_id":"...","reports":[...],"resumed":true}}

Display progress to the user as each line arrives.

Key progress events

step	meaning
`resume`	Pipeline is resuming from a saved checkpoint (not starting from zero)
`info`	Informational message (e.g. which step was resumed from)
`normalize`	Loading and preprocessing files
`ocr`	OCR text extraction per page
`ocr_retry`	Retrying previously failed pages only
`structure`	AI analyzing and grouping reports
`normalize_indicators`	Standardizing indicator names
`done`	Complete — `data` field contains the full results
`error`	Pipeline failed — check `message` for details

If step=resume appears, tell the user: "正在从上次的进度继续处理（不需要重新开始）"

Error responses from `$OCRSTREAM`

code	meaning	action
`processing_in_progress`	Another OCR run is still active	Wait and retry, or poll `/status`
`ocr_limit_exceeded`	Free OCR quota exhausted	Tell user to upgrade
(no code)	Pipeline error (LLM timeout etc.)	Retry — will auto-resume from checkpoint

Step 2b: Poll status (when SSE disconnects)

If the SSE stream disconnects (network timeout, terminal closed), use the status endpoint to check progress:

$APICALL GET "/api/skill/sessions/\x3Csession_id>/status"

Returns:

{
  "session_id": "uuid",
  "status": "processing",
  "pipeline": {
    "completedStep": "ocr",
    "pageCount": 5,
    "ocrCompleted": 4,
    "ocrFailed": 1,
    "reportCount": 0,
    "errors": [{"step":"ocr","pageIndex":2,"message":"PaddleOCR timeout"}]
  },
  "error": null,
  "is_stale": false
}

Field guide:

status = processing + is_stale = false → OCR is still running normally
status = processing + is_stale = true → Worker crashed/timed out, safe to retry OCR
status = awaiting_confirm → OCR completed! Fetch session detail for results
status = uploading + error present → Last OCR attempt failed, retry will resume from checkpoint
pipeline.completedStep → How far the pipeline got (normalize → ocr → structure → done)
pipeline.ocrFailed → Number of pages that failed OCR (will be retried on next attempt)

Polling workflow:

1. Call $OCRSTREAM → SSE disconnects mid-way
2. Poll GET /sessions/\x3Cid>/status every 5-10 seconds
3. When status = "awaiting_confirm" → fetch full results with GET /sessions/\x3Cid>
4. If status = "uploading" (failed) → retry with $OCRSTREAM (auto-resumes)
5. If is_stale = true → retry with $OCRSTREAM (auto-resumes from checkpoint)

Step 3: Review results (MANDATORY)

Parse the step=done data. Show formatted summary. Do NOT auto-confirm.

Each report has a reportType field: lab, genetic, imaging, pathology, or other.

Lab reports (reportType = "lab")

Show indicators table:

📋 报告 1: [lab] 尿生化 (编号: 114431194)
   日期: 2025-02-05  医生: 俞海瑾
   指标: 12 个 (3 个异常)
   ┌──────────────────────┬────────┬──────────┬────────────┬──────┐
   │ 指标                 │ 结果   │ 单位     │ 参考范围   │ 异常 │
   ├──────────────────────┼────────┼──────────┼────────────┼──────┤
   │ 24H尿钠              │ 130.0  │ mmol/24h │ 137-257    │  ⬇   │
   └──────────────────────┴────────┴──────────┴────────────┴──────┘

Non-lab reports (reportType = "genetic" / "imaging" / etc.)

Show summary + sections:

📋 报告 1: [genetic] 基因检测报告
   日期: 2025-09-12  检测机构: 南京申友医学检验所
   摘要: 心血管18项基因检测...高血压、冠心病风险一般...
   段落: 18 sections
     [gene_variant] 高血压 — 风险: 正常
     [gene_variant] 冠心病 — 风险: 一般
     [medication] ACEI类降压药 — 正常代谢型
     ...

Supported file types

Images (JPG/PNG/HEIC): PaddleOCR → structure
PDF (any size): Azure Mistral Document AI page-split → structure
- Large PDFs (e.g. 23-page gene report, 9.6MB) are fully supported

Step 4: Confirm and save

After user confirms:

$APICALL POST "/api/skill/sessions/\x3Csession_id>/confirm" '{"reports":[\x3Creports from step 2>]}'

Returns: {"success":true,"message":"2 report(s) saved","recordIds":[...]}

Resuming incomplete sessions

When the user asks to continue/resume a previous upload, or when checking for unfinished work:

Step A: Find pending sessions

# List sessions that need OCR (uploaded but not processed)
$APICALL GET "/api/skill/sessions?status=uploading"

# List sessions stuck in processing (user exited mid-OCR)
$APICALL GET "/api/skill/sessions?status=processing"

# List sessions with OCR done but not yet confirmed
$APICALL GET "/api/skill/sessions?status=awaiting_confirm"

Show a summary of pending sessions to the user (file names, dates, status).

Step B: Resume based on status

uploading: Start OCR directly → go to Step 2 ($OCRSTREAM \x3Csession_id>)
- If there's a saved checkpoint (previous failed attempt), OCR auto-resumes from it
processing: Check with status endpoint first:
```
$APICALL GET "/api/skill/sessions/\x3Csession_id>/status"
```
- is_stale = false → still running, wait or poll
- is_stale = true → worker died, safe to retry: $OCRSTREAM \x3Csession_id> (auto-resumes from checkpoint)
awaiting_confirm: Get session detail → show results → go to Step 3 (review & confirm)

# Get full detail of a pending session (includes OCR results if awaiting_confirm)
$APICALL GET "/api/skill/sessions/\x3Csession_id>"

If the session is awaiting_confirm, the response includes ocr_result with the previously parsed reports — display them for review and proceed to Step 3 (confirm).

Resume-aware response handling

When $OCRSTREAM outputs step=done:

resumed = true in the data → tell user: "已从上次的进度恢复，OCR 结果已就绪"
resumed = false (or absent) → normal fresh run

When $OCRSTREAM outputs step=error:

code = processing_in_progress → tell user OCR is still running, poll /status instead
code = ocr_limit_exceeded → tell user to upgrade
No code → LLM/network error, safe to retry (will auto-resume from checkpoint)

Step C: Delete individual reports or stale sessions

Delete a single report (does NOT affect other reports in the same session):

$APICALL DELETE "/api/skill/sessions/\x3Csession_id>/records/\x3Crecord_id>"

Delete an entire session (cascade deletes ALL files + reports):

$APICALL DELETE "/api/skill/sessions/\x3Csession_id>"

Other session operations

# List all sessions (all statuses)
$APICALL GET /api/skill/sessions

# List sessions filtered by status: uploading | processing | awaiting_confirm | completed
$APICALL GET "/api/skill/sessions?status=\x3Cstatus>"

# Get session detail (includes OCR results if awaiting_confirm, saved reports if completed)
$APICALL GET "/api/skill/sessions/\x3Csession_id>"

# Poll OCR progress (lightweight, use when SSE disconnects)
$APICALL GET "/api/skill/sessions/\x3Csession_id>/status"

# Delete single report (keeps session and other reports intact)
$APICALL DELETE "/api/skill/sessions/\x3Csession_id>/records/\x3Crecord_id>"

# Delete entire session (undo everything: files + reports)
$APICALL DELETE "/api/skill/sessions/\x3Csession_id>"

Usage Guidance

Before installing or invoking this skill: 1) Understand it will, by default, send and process medical reports (PHI) to external CareMax APIs and will run OCR immediately after upload unless the user explicitly asks to only upload — confirm you have user consent and comply with privacy rules. 2) The skill has no bundled code but requires a sibling directory '../caremax-auth/' and will execute its scripts; obtain that component from a trusted source and inspect scripts (upload.sh, ocr-stream.sh, auth-flow.sh) to see where data and credentials are sent/stored. 3) The skill does not declare required credentials — expect the auth scripts to create/use tokens; verify how/where tokens are stored and whether they could be exfiltrated. 4) If you cannot review the caremax-auth scripts, treat this skill as risky: do not use it with real patient data. 5) If you want to proceed, ask the publisher for the official caremax-auth repo link, a list of endpoints the scripts call, and an explanation of how credentials and session data are stored and protected.

Capability Analysis

Type: OpenClaw Skill Name: caremax-ocr Version: 1.0.0 The skill implements a medical report OCR workflow that relies on executing bash scripts from a sibling directory ('../caremax-auth/'), which is a high-risk architectural pattern as it delegates execution to external, unverified code. The SKILL.md file uses highly prescriptive 'MANDATORY' instructions to control agent behavior, forcing a continuous workflow (upload followed by immediate OCR) and instructing the agent to ignore potential user stopping points. While these features support the stated functionality, the combination of sensitive medical data handling, external script dependencies, and rigid prompt-based control over the agent's logic warrants a suspicious classification.

Capability Assessment

ℹ Purpose & Capability

Name/description (CareMax OCR) align with instructions to upload files and call OCR. However, the skill requires a sibling component ('../caremax-auth/') for all API/auth operations but does not declare that dependency in the registry metadata or require any credentials — an implicit external dependency that should be explicit.

⚠ Instruction Scope

The SKILL.md instructs the agent to execute bash scripts from a sibling directory (upload.sh, ocr-stream.sh, auth-flow.sh) and to automatically run OCR immediately after upload unless the user explicitly forbids it. That means the agent will, by default, transmit/process sensitive medical data to remote services and execute external code it cannot vet from this skill alone. It also instructs polling and restart behavior and to stream SSE lines directly to users. These behaviors expand scope beyond a simple local helper and require careful review of the referenced scripts and data flows.

✓ Install Mechanism

No install spec and no code files are included in this package (instruction-only). That minimizes on-disk installation risk for this skill itself, but it explicitly depends on external scripts in '../caremax-auth/' which will be executed at runtime.

⚠ Credentials

The skill declares no environment variables or primary credential, yet its runtime behavior depends entirely on external auth scripts that will handle API credentials. The registry metadata does not list or justify any credentials; this hidden credential handling (and cross-directory script execution) is a proportionality and transparency concern, especially given the sensitivity of medical data.

ℹ Persistence & Privilege

always is false and there is no install writing files from this skill. However, the skill's recommended auth-flow may create/modify credentials or token files via the sibling caremax-auth scripts. Because the skill runs external scripts, it could end up storing tokens or altering auth state outside its own directory—review the caremax-auth scripts before use.

Version History

v1.0.0

Initial publish to ClawHub

Metadata

Slug caremax-ocr

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is CareMax OCR?

Upload medical reports and run OCR recognition via CareMax Health API. After upload succeeds, agents MUST immediately run OCR on the same session unless the... It is an AI Agent Skill for Claude Code / OpenClaw, with 93 downloads so far.

How do I install CareMax OCR?

Run "/install caremax-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is CareMax OCR free?

Yes, CareMax OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does CareMax OCR support?

CareMax OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created CareMax OCR?

It is built and maintained by Qitao Yang (@kittenyang); the current version is v1.0.0.

More Skills

CareMax OCR