/install doc-ocr-skills
Document OCR Skill (docr)
Uses Gemini 2.5 Flash, PaddleOCR, or RapidOCR (local) to recognize text from scanned PDFs and images. Compiled as a single Go binary.
Prerequisites
- API Key configured in
~/.ocr/config(not needed for Paddle/Rapid) - For RapidOCR engine:
pip install rapidocr_onnxruntime - For PaddleOCR engine:
pip install paddleocr paddlepaddle
API Key Configuration
Create the config file:
mkdir -p ~/.ocr
cat > ~/.ocr/config \x3C\x3C EOF
# Google Gemini API Key
gemini_api_key=your_gemini_key
EOF
Quick Start
Path Variable: All commands below use
$DOCR. Before running any command, set this variable:SKILL_DIR="$(cd "$(dirname "\x3Cpath-to-this-SKILL.md>")" && pwd)" DOCR="$SKILL_DIR/scripts/docr/docr"
# OCR a single document using RapidOCR (default)
$DOCR document.pdf
$DOCR image.jpg
# Use Gemini engine
$DOCR -engine gemini document.pdf
# Use PaddleOCR local engine
$DOCR -engine paddle document.pdf
# Specify output file
$DOCR document.pdf -o result.txt
# Batch process all supported files in a directory
$DOCR -batch ./docs/ -o ./outputs/
Engines
| Engine | Flag | API Key Config | Doc Handling |
|---|---|---|---|
| RapidOCR (default) | -engine rapid |
None | Local OCR |
| Gemini | -engine gemini |
gemini_api_key |
Cloud Vision API |
| PaddleOCR (local) | -engine paddle |
None | Local OCR |
CLI Reference
docr [options] \x3Cfile or directory>
Options:
-engine string OCR engine: rapid (default) / gemini / paddle
-e string Engine (short flag)
-o string Output file path or directory (batch mode)
-output string Output path (long flag)
-batch Batch mode: process all files in directory
-prompt string Custom recognition prompt (gemini)
Installation
We provide pre-compiled binaries to get you started quickly.
cd doc-ocr-skills/scripts
./install.sh
This script will detect your OS (darwin/linux) and architecture (amd64/arm64) and download the appropriate version of docr.
Building from Source (Optional)
If you prefer to build from source, ensure you have Go 1.21+ installed:
cd doc-ocr-skills/scripts/docr
go build -o docr .
Error Handling
| Error | Solution |
|---|---|
config file not found |
Create ~/.ocr/config with API keys |
gemini_api_key not found |
Add gemini_api_key=VALUE to config |
file not found |
Verify the document file path |
| API timeout | Retry; large files may need longer |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install doc-ocr-skills - After installation, invoke the skill by name or use
/doc-ocr-skills - Provide required inputs per the skill's parameter spec and get structured output
What is Doc Ocr Skills?
OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local). It is an AI Agent Skill for Claude Code / OpenClaw, with 495 downloads so far.
How do I install Doc Ocr Skills?
Run "/install doc-ocr-skills" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Doc Ocr Skills free?
Yes, Doc Ocr Skills is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Doc Ocr Skills support?
Doc Ocr Skills is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Doc Ocr Skills?
It is built and maintained by sirk (@scottkiss); the current version is v0.1.0.