← 返回 Skills 市场
tobewin

china-doc-ocr

作者 ToBeWin · GitHub ↗ · v1.2.0 · MIT-0
cross-platform ✓ 安全检测通过
1204
总下载
0
收藏
5
当前安装
4
版本数
在 OpenClaw 中安装
/install china-doc-ocr
功能描述
智能文档OCR识别与结构化提取。Use when the user has a complex document, PDF, scanned image, photo, invoice, receipt, ID card, table, or chart that needs to be recognized a...
使用说明 (SKILL.md)

智能文档 OCR China Doc OCR

识别并提取复杂文档内容:PDF、图片、扫描件、发票、表格、证件等。 使用硅基流动 DeepSeek-OCR / PaddleOCR-VL,国内直连,无需翻墙。

模型选择与参数说明 → references/models.md 各场景提示词模板 → references/prompts.md

触发时机

  • "帮我识别这个PDF/图片里的内容"
  • "把这张发票/收据的信息提取出来"
  • "将这份扫描合同转成可编辑文字"
  • "这个表格里的数据帮我提取一下"
  • "帮我把这张截图的文字识别出来"
  • "这份报告转成 Markdown 格式"
  • "识别这张身份证/营业执照的信息"

模型选择策略(优先OCR)

OCR优先级:
1. PaddleOCR-VL-1.5 (免费、快速、专业OCR)
2. DeepSeek-OCR (免费、效果好)
3. Qwen2.5-VL-72B (视觉语言模型,OCR效果一般但可补充)

默认使用 PaddleOCR-VL-1.5
如果识别效果不好,降级到 DeepSeek-OCR
如果仍然不好,降级到 Qwen2.5-VL-72B

Step 0:环境检查

# 检查 API Key
if [ -z "$SILICONFLOW_API_KEY" ]; then
  echo "缺少 SILICONFLOW_API_KEY"
  echo "配置方法:"
  echo "  1. 访问 cloud.siliconflow.cn 注册(国内直连)"
  echo "  2. 进入「API密钥」页面创建 Key"
  echo "  3. export SILICONFLOW_API_KEY='sk-xxxxxxxx'"
  exit 1
fi

Step 1:识别内容类型,选择处理模式

用户提供文件路径或 URL → 判断类型:

文件扩展名/用户描述 → 处理模式:

.pdf                    → PDF 模式
.jpg/.jpeg/.png/.webp   → 图片模式
.bmp/.tiff/.gif         → 图片模式(先转换格式)
URL(http/https开头)   → URL 直接模式
用户粘贴了 base64       → 直接使用

用户意图 → 选择 Prompt 模式:

"转成文字/提取文字"     → 通用OCR
"转成Markdown/保留格式" → 文档转Markdown
"提取表格/表格数据"     → 图表解析
"发票/收据/单据"        → 发票识别
"身份证/证件/执照"      → 证件识别
"图表/图形/柱状图"      → 图表解析
未指定                  → 默认文档转Markdown

Step 2:图片 OCR

本地图片文件

python3 scripts/ocr.py \
  --image "/path/to/image.jpg" \
  --prompt "Convert the document to markdown." \
  --model paddleocr

图片 URL

python3 scripts/ocr.py \
  --url "https://example.com/document.jpg" \
  --prompt "Convert the document to markdown." \
  --model deepseek

指定模型

# 使用 PaddleOCR(默认,推荐)
python3 scripts/ocr.py --image photo.jpg --model paddleocr

# 使用 DeepSeek-OCR
python3 scripts/ocr.py --image photo.jpg --model deepseek

# 使用 Qwen2.5-VL
python3 scripts/ocr.py --image photo.jpg --model qwen

Step 3:PDF OCR

单页或少页 PDF

python3 scripts/ocr.py \
  --pdf "/path/to/document.pdf" \
  --prompt "Convert the document to markdown." \
  --model deepseek

多页 PDF

多页 PDF 需要分页处理。使用 Python 脚本:

  1. 使用 pypdf 分页
  2. 对每页分别调用 OCR
  3. 合并结果

Step 4:格式化输出

识别完成后根据用户需求输出:

文档转 Markdown(保留结构)

直接输出 Markdown 内容,保留:
  - 标题层级(# ## ###)
  - 列表(- * 1.)
  - 表格(| 列1 | 列2 |)
  - 代码块(```)
  - 加粗、斜体等格式

发票/证件识别(结构化输出)

发票识别结果
━━━━━━━━━━━━━━━━━━━━
发票类型:增值税专用发票
发票号码:XXXXXXXXXXXXXXXX
开票日期:2026年03月21日
购买方:[公司名称]
销售方:[公司名称]
商品/服务:[明细]
不含税金额:¥X,XXX.XX
税率:13%
税额:¥XXX.XX
价税合计:¥X,XXX.XX

表格数据(CSV 友好格式)

识别结果同时输出:
1. Markdown 表格(可读)
2. 询问用户是否需要 CSV 格式(方便导入 Excel)

输出文件保存

识别结果保存到工作区,长期保留。


错误处理

文件不存在           → 提示用户确认路径
文件过大(>10MB)    → 建议压缩或分页处理
图片分辨率过低       → 提示识别效果可能较差,建议重新拍摄
PDF 加密            → 提示需要先解密
识别结果为空         → 可能是纯图片型PDF,尝试截图后重新识别
401 错误            → API Key 失效,重新获取
429 错误            → 请求频率超限,等待后重试

注意事项

  • 图片最小 56×56,最大 3584×3584 像素,超出会自动压缩
  • PDF 支持 base64 编码输入
  • 多页 PDF 需要安装 pypdf(用户需手动安装)
  • detail=high 时按实际像素计费,detail=low 统一约256 token
  • 发票/证件等隐私文件处理后请及时删除工作区临时文件
安全使用建议
This skill is coherent for cloud OCR: it needs python3 and a SILICONFLOW_API_KEY and will upload files (as base64 data URLs) to https://api.siliconflow.cn for processing. Before installing: 1) Confirm you trust SiliconFlow (check their cloud.siliconflow.cn site, privacy/retention and billing policies). 2) Avoid sending highly sensitive PII or confidential documents unless you accept remote processing and storage risks. 3) Note SKILL.md mentions long-term workspace retention but the script only prints results — verify how your agent/platform persists outputs and delete any temporary files if needed. 4) For large/multi-page PDFs follow the SKILL.md advice to split pages (pypdf) to avoid size or timeout issues.
功能分析
Type: OpenClaw Skill Name: china-doc-ocr Version: 1.2.0 The skill bundle is a legitimate OCR tool designed to interface with the SiliconFlow API for document recognition. The core logic in `scripts/ocr.py` uses standard Python libraries to process local images/PDFs or remote URLs and send them to a verified API endpoint (api.siliconflow.cn). There is no evidence of data exfiltration, malicious code execution, or harmful prompt injection; the instructions in `SKILL.md` and the reference files are strictly aligned with the stated purpose of document structure extraction and OCR.
能力标签
cryptocan-make-purchasesrequires-sensitive-credentials
能力评估
Purpose & Capability
Name/description, required binary (python3), the single required env var (SILICONFLOW_API_KEY), and the provided script (scripts/ocr.py) are coherent with an OCR/document-extraction skill. Model choices and prompts all align with OCR use-cases.
Instruction Scope
SKILL.md and scripts/ocr.py instruct reading local files or URLs, base64-encoding them, and sending them to https://api.siliconflow.cn for OCR — this is expected for a cloud OCR service. Note: SKILL.md says results are "saved to the workspace, long-term," but the included script prints results (it does not itself persist files); verify how your agent/platform will store outputs. Also the skill will transmit full document contents (including sensitive IDs/invoices) to a third party — expected functionality but a privacy consideration.
Install Mechanism
No install spec — instruction-only with a small Python script included. No downloads from untrusted URLs or archive extraction. This is low installation risk.
Credentials
Requires only one env var (SILICONFLOW_API_KEY), which is appropriate for a hosted OCR API. However the API key grants remote service access to any data you send; ensure you trust the SiliconFlow service and check its retention/privacy/billing policies before sending sensitive documents.
Persistence & Privilege
always:false and normal invocation settings. The skill does not request elevated or system-wide privileges and does not modify other skills. The only persistence implication is data storage of OCR outputs (SKILL.md mentions retaining workspace files) — confirm where outputs are stored by your agent.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install china-doc-ocr
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /china-doc-ocr 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.2.0
v1.2.0: Security hardening - removed curl dependency, replaced with Python script (ocr.py). Uses urllib for API calls, no external network tools required.
v1.1.0
优化模型选择:优先使用PaddleOCR-VL-1.5和DeepSeek-OCR,明确模型降级策略
v1.0.1
修复元数据格式:将多行YAML改为单行JSON,解决requires.env声明不被识别的安全扫描问题
v1.0.0
china-doc-ocr 1.0.0 – Initial Release - Adds intelligent OCR and structured content extraction for complex documents: PDFs, images, scans, receipts, IDs, tables, and charts. - Supports advanced layouts, multi-column text, and mixed image-text that default tools cannot process. - Integrates DeepSeek-OCR and PaddleOCR-VL models for domestic, VPN-free usage (uses same API key as china-image-gen and china-tts). - Provides fully documented environment checks, file type handling (including PDF and multipage support), and output formatting to Markdown. - Suitable for extracting structured information from complex, scanned, or graphical documents.
元数据
Slug china-doc-ocr
版本 1.2.0
许可证 MIT-0
累计安装 5
当前安装数 5
历史版本数 4
常见问题

china-doc-ocr 是什么?

智能文档OCR识别与结构化提取。Use when the user has a complex document, PDF, scanned image, photo, invoice, receipt, ID card, table, or chart that needs to be recognized a... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1204 次。

如何安装 china-doc-ocr?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install china-doc-ocr」即可一键安装,无需额外配置。

china-doc-ocr 是免费的吗?

是的,china-doc-ocr 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

china-doc-ocr 支持哪些平台?

china-doc-ocr 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 china-doc-ocr?

由 ToBeWin(@tobewin)开发并维护,当前版本 v1.2.0。

💬 留言讨论