← Back to Skills Marketplace
nhzallen

Agent Paddleocr Vision

by Allen Niu · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ Security Clean
273
Downloads
0
Stars
2
Active Installs
3
Versions
Install in OpenClaw
/install agent-paddleocr-vision
Description
Multi-language document understanding with PaddleOCR
README (SKILL.md)

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

  • OCR extraction via PaddleOCR cloud API (requires credentials)
  • 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
  • Action suggestion with structured parameters
  • Batch processing
  • Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

Type Actions
Invoice create_expense, archive, tax_report
Business Card add_contact, save_vcard
Receipt create_expense, split_bill
Table export_csv, analyze_data
Contract summarize, extract_dates, flag_obligations
ID Card extract_id_info, verify_age
Passport store_passport_info, check_validity
Bank Statement categorize_transactions, generate_report
Driver License store_license_info, check_expiry
Tax Form summarize_tax, suggest_deductions
General summarize, translate, search_keywords

Configuration

Required environment variables:

  • PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
  • PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

  • PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

  • 中文: docs/README.zh.md
  • English: docs/README.en.md
  • Español: docs/README.es.md
  • العربية: docs/README.ar.md

License

MIT-0


Made for OpenClaw. Let your agent see and act.

Usage Guidance
This skill appears coherent for calling a PaddleOCR cloud service. Before installing: (1) only set PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN if you trust the endpoint—documents and the token will be sent there; prefer your provider's official endpoint or a self-hosted instance if handling sensitive docs; (2) inspect scripts/requirements.txt and the scripts (notably scripts/ocr_engine.py) to confirm no unexpected network calls or logging of tokens; (3) run pip installs in an isolated environment (venv/container) and ensure poppler is the official package; (4) avoid processing highly sensitive documents until you confirm the endpoint and token policies; (5) verify rate limits, data retention, and token scope with the PaddleOCR provider.
Capability Analysis
Type: OpenClaw Skill Name: agent-paddleocr-vision Version: 1.1.0 The agent-paddleocr-vision skill bundle is a legitimate tool for document OCR and analysis using the PaddleOCR cloud API. It includes robust logic for document classification (classify.py), structured data extraction (actions.py), and the generation of searchable PDFs (make_searchable_pdf.py). The code uses standard libraries like httpx for API communication and follows best practices by requiring user-configured environment variables for sensitive credentials. No evidence of malicious intent, data exfiltration, or prompt injection was found.
Capability Assessment
Purpose & Capability
Name/description, required binaries (python), and required env vars (PADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TOKEN) align with a cloud-OCR integration. Declared functionality (searchable PDF, classification, suggested actions) matches the included scripts.
Instruction Scope
SKILL.md and examples limit runtime activity to calling the PaddleOCR parsing endpoint, parsing OCR results, generating searchable PDFs, and writing outputs. There are no instructions to read unrelated system files or other credentials. The agent is expected to send user documents to the configured API endpoint (as intended).
Install Mechanism
The registry provides no automated install spec (instruction-only), but SKILL.md tells users to pip install -r scripts/requirements.txt and install system packages (poppler). Installing Python packages pulls code from PyPI which is routine but carries the usual supply-chain risk; inspect scripts/requirements.txt before running pip as a best practice.
Credentials
Only two environment variables are required: the PaddleOCR API URL and access token (primary credential). Both are necessary for a cloud-OCR integration. No unrelated secrets, system config paths, or extra credentials are requested.
Persistence & Privilege
The skill is not forced-always, does not request persistent elevated privileges, and does not modify other skills' configs. It runs as an invoked tool and writes outputs (JSON, PDFs) to disk as expected.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install agent-paddleocr-vision
  3. After installation, invoke the skill by name or use /agent-paddleocr-vision
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
- Documentation moved to the new docs/ directory with multi-language support (Arabic, English, Spanish, Chinese). - Removed template files for document types (e.g., bank_statement, business_card, invoice, etc.). - Cleaned up project structure by deleting unused and redundant files. - README and integration details now consolidated and easier to navigate.
v1.0.1
No file changes detected for version 1.0.1. - No updates or modifications from the previous version. - Functionality, documentation, and configuration remain unchanged.
v1.0.0
Initial release of agent-paddleocr-vision. - Multi-language OCR extraction using PaddleOCR cloud API. - Supports 11 document types with automatic classification and tailored action suggestions. - Generates searchable PDFs with accurate layout and bounding boxes. - Batch processing capabilities for folders of documents. - Structured output with integration instructions and multi-language documentation. - Requires API credentials set via environment variables.
Metadata
Slug agent-paddleocr-vision
Version 1.1.0
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 3
Frequently Asked Questions

What is Agent Paddleocr Vision?

Multi-language document understanding with PaddleOCR. It is an AI Agent Skill for Claude Code / OpenClaw, with 273 downloads so far.

How do I install Agent Paddleocr Vision?

Run "/install agent-paddleocr-vision" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Paddleocr Vision free?

Yes, Agent Paddleocr Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Agent Paddleocr Vision support?

Agent Paddleocr Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Paddleocr Vision?

It is built and maintained by Allen Niu (@nhzallen); the current version is v1.1.0.

💬 Comments