/install doc-ocr-skills
Document OCR Skill (docr)
Uses Gemini 2.5 Flash, PaddleOCR, or RapidOCR (local) to recognize text from scanned PDFs and images. Compiled as a single Go binary.
Prerequisites
- API Key configured in
~/.ocr/config(not needed for Paddle/Rapid) - For RapidOCR engine:
pip install rapidocr_onnxruntime - For PaddleOCR engine:
pip install paddleocr paddlepaddle
API Key Configuration
Create the config file:
mkdir -p ~/.ocr
cat > ~/.ocr/config \x3C\x3C EOF
# Google Gemini API Key
gemini_api_key=your_gemini_key
EOF
Quick Start
Path Variable: All commands below use
$DOCR. Before running any command, set this variable:SKILL_DIR="$(cd "$(dirname "\x3Cpath-to-this-SKILL.md>")" && pwd)" DOCR="$SKILL_DIR/scripts/docr/docr"
# OCR a single document using RapidOCR (default)
$DOCR document.pdf
$DOCR image.jpg
# Use Gemini engine
$DOCR -engine gemini document.pdf
# Use PaddleOCR local engine
$DOCR -engine paddle document.pdf
# Specify output file
$DOCR document.pdf -o result.txt
# Batch process all supported files in a directory
$DOCR -batch ./docs/ -o ./outputs/
Engines
| Engine | Flag | API Key Config | Doc Handling |
|---|---|---|---|
| RapidOCR (default) | -engine rapid |
None | Local OCR |
| Gemini | -engine gemini |
gemini_api_key |
Cloud Vision API |
| PaddleOCR (local) | -engine paddle |
None | Local OCR |
CLI Reference
docr [options] \x3Cfile or directory>
Options:
-engine string OCR engine: rapid (default) / gemini / paddle
-e string Engine (short flag)
-o string Output file path or directory (batch mode)
-output string Output path (long flag)
-batch Batch mode: process all files in directory
-prompt string Custom recognition prompt (gemini)
Installation
We provide pre-compiled binaries to get you started quickly.
cd doc-ocr-skills/scripts
./install.sh
This script will detect your OS (darwin/linux) and architecture (amd64/arm64) and download the appropriate version of docr.
Building from Source (Optional)
If you prefer to build from source, ensure you have Go 1.21+ installed:
cd doc-ocr-skills/scripts/docr
go build -o docr .
Error Handling
| Error | Solution |
|---|---|
config file not found |
Create ~/.ocr/config with API keys |
gemini_api_key not found |
Add gemini_api_key=VALUE to config |
file not found |
Verify the document file path |
| API timeout | Retry; large files may need longer |
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install doc-ocr-skills - 安装完成后,直接呼叫该 Skill 的名称或使用
/doc-ocr-skills触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Doc Ocr Skills 是什么?
OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 495 次。
如何安装 Doc Ocr Skills?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-ocr-skills」即可一键安装,无需额外配置。
Doc Ocr Skills 是免费的吗?
是的,Doc Ocr Skills 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Doc Ocr Skills 支持哪些平台?
Doc Ocr Skills 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Doc Ocr Skills?
由 sirk(@scottkiss)开发并维护,当前版本 v0.1.0。