← 返回 Skills 市场
nhzallen

Agent Paddleocr Vision

作者 Allen Niu · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ 安全检测通过
273
总下载
0
收藏
2
当前安装
3
版本数
在 OpenClaw 中安装
/install agent-paddleocr-vision
功能描述
Multi-language document understanding with PaddleOCR
使用说明 (SKILL.md)

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

  • OCR extraction via PaddleOCR cloud API (requires credentials)
  • 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
  • Action suggestion with structured parameters
  • Batch processing
  • Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

Type Actions
Invoice create_expense, archive, tax_report
Business Card add_contact, save_vcard
Receipt create_expense, split_bill
Table export_csv, analyze_data
Contract summarize, extract_dates, flag_obligations
ID Card extract_id_info, verify_age
Passport store_passport_info, check_validity
Bank Statement categorize_transactions, generate_report
Driver License store_license_info, check_expiry
Tax Form summarize_tax, suggest_deductions
General summarize, translate, search_keywords

Configuration

Required environment variables:

  • PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
  • PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

  • PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

  • 中文: docs/README.zh.md
  • English: docs/README.en.md
  • Español: docs/README.es.md
  • العربية: docs/README.ar.md

License

MIT-0


Made for OpenClaw. Let your agent see and act.

安全使用建议
This skill appears coherent for calling a PaddleOCR cloud service. Before installing: (1) only set PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN if you trust the endpoint—documents and the token will be sent there; prefer your provider's official endpoint or a self-hosted instance if handling sensitive docs; (2) inspect scripts/requirements.txt and the scripts (notably scripts/ocr_engine.py) to confirm no unexpected network calls or logging of tokens; (3) run pip installs in an isolated environment (venv/container) and ensure poppler is the official package; (4) avoid processing highly sensitive documents until you confirm the endpoint and token policies; (5) verify rate limits, data retention, and token scope with the PaddleOCR provider.
功能分析
Type: OpenClaw Skill Name: agent-paddleocr-vision Version: 1.1.0 The agent-paddleocr-vision skill bundle is a legitimate tool for document OCR and analysis using the PaddleOCR cloud API. It includes robust logic for document classification (classify.py), structured data extraction (actions.py), and the generation of searchable PDFs (make_searchable_pdf.py). The code uses standard libraries like httpx for API communication and follows best practices by requiring user-configured environment variables for sensitive credentials. No evidence of malicious intent, data exfiltration, or prompt injection was found.
能力评估
Purpose & Capability
Name/description, required binaries (python), and required env vars (PADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TOKEN) align with a cloud-OCR integration. Declared functionality (searchable PDF, classification, suggested actions) matches the included scripts.
Instruction Scope
SKILL.md and examples limit runtime activity to calling the PaddleOCR parsing endpoint, parsing OCR results, generating searchable PDFs, and writing outputs. There are no instructions to read unrelated system files or other credentials. The agent is expected to send user documents to the configured API endpoint (as intended).
Install Mechanism
The registry provides no automated install spec (instruction-only), but SKILL.md tells users to pip install -r scripts/requirements.txt and install system packages (poppler). Installing Python packages pulls code from PyPI which is routine but carries the usual supply-chain risk; inspect scripts/requirements.txt before running pip as a best practice.
Credentials
Only two environment variables are required: the PaddleOCR API URL and access token (primary credential). Both are necessary for a cloud-OCR integration. No unrelated secrets, system config paths, or extra credentials are requested.
Persistence & Privilege
The skill is not forced-always, does not request persistent elevated privileges, and does not modify other skills' configs. It runs as an invoked tool and writes outputs (JSON, PDFs) to disk as expected.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install agent-paddleocr-vision
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /agent-paddleocr-vision 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
- Documentation moved to the new docs/ directory with multi-language support (Arabic, English, Spanish, Chinese). - Removed template files for document types (e.g., bank_statement, business_card, invoice, etc.). - Cleaned up project structure by deleting unused and redundant files. - README and integration details now consolidated and easier to navigate.
v1.0.1
No file changes detected for version 1.0.1. - No updates or modifications from the previous version. - Functionality, documentation, and configuration remain unchanged.
v1.0.0
Initial release of agent-paddleocr-vision. - Multi-language OCR extraction using PaddleOCR cloud API. - Supports 11 document types with automatic classification and tailored action suggestions. - Generates searchable PDFs with accurate layout and bounding boxes. - Batch processing capabilities for folders of documents. - Structured output with integration instructions and multi-language documentation. - Requires API credentials set via environment variables.
元数据
Slug agent-paddleocr-vision
版本 1.1.0
许可证 MIT-0
累计安装 2
当前安装数 2
历史版本数 3
常见问题

Agent Paddleocr Vision 是什么?

Multi-language document understanding with PaddleOCR. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 273 次。

如何安装 Agent Paddleocr Vision?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install agent-paddleocr-vision」即可一键安装,无需额外配置。

Agent Paddleocr Vision 是免费的吗?

是的,Agent Paddleocr Vision 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Agent Paddleocr Vision 支持哪些平台?

Agent Paddleocr Vision 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Agent Paddleocr Vision?

由 Allen Niu(@nhzallen)开发并维护,当前版本 v1.1.0。

💬 留言讨论