/install agent-paddleocr-vision
Agent PaddleOCR Vision
OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.
What It Does
- OCR extraction via PaddleOCR cloud API (requires credentials)
- 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
- Action suggestion with structured parameters
- Batch processing
- Searchable PDF generation (with bbox alignment)
Quick Start
# Install dependencies
pip3 install -r scripts/requirements.txt
# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token
# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf
Batch
python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out
Output
See docs/README.zh.md for full JSON schema and integration guide.
Supported Types
| Type | Actions |
|---|---|
| Invoice | create_expense, archive, tax_report |
| Business Card | add_contact, save_vcard |
| Receipt | create_expense, split_bill |
| Table | export_csv, analyze_data |
| Contract | summarize, extract_dates, flag_obligations |
| ID Card | extract_id_info, verify_age |
| Passport | store_passport_info, check_validity |
| Bank Statement | categorize_transactions, generate_report |
| Driver License | store_license_info, check_expiry |
| Tax Form | summarize_tax, suggest_deductions |
| General | summarize, translate, search_keywords |
Configuration
Required environment variables:
PADDLEOCR_DOC_PARSING_API_URL— API endpoint ending in/layout-parsingPADDLEOCR_ACCESS_TOKEN— Access token
Optional:
PADDLEOCR_DOC_PARSING_TIMEOUT— Default 600 seconds
Searchable PDF
With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).
Full Documentation
Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:
- 中文:
docs/README.zh.md - English:
docs/README.en.md - Español:
docs/README.es.md - العربية:
docs/README.ar.md
License
MIT-0
Made for OpenClaw. Let your agent see and act.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install agent-paddleocr-vision - 安装完成后,直接呼叫该 Skill 的名称或使用
/agent-paddleocr-vision触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Agent Paddleocr Vision 是什么?
Multi-language document understanding with PaddleOCR. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 273 次。
如何安装 Agent Paddleocr Vision?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install agent-paddleocr-vision」即可一键安装,无需额外配置。
Agent Paddleocr Vision 是免费的吗?
是的,Agent Paddleocr Vision 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Agent Paddleocr Vision 支持哪些平台?
Agent Paddleocr Vision 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Agent Paddleocr Vision?
由 Allen Niu(@nhzallen)开发并维护,当前版本 v1.1.0。