← Back to Skills Marketplace

Agent Paddleocr Vision

Name: Agent Paddleocr Vision
Author: nhzallen

by Allen Niu · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ✓ Security Clean

273

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install agent-paddleocr-vision

Description

Multi-language document understanding with PaddleOCR

README (SKILL.md)

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

OCR extraction via PaddleOCR cloud API (requires credentials)
11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
Action suggestion with structured parameters
Batch processing
Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

Type	Actions
Invoice	create_expense, archive, tax_report
Business Card	add_contact, save_vcard
Receipt	create_expense, split_bill
Table	export_csv, analyze_data
Contract	summarize, extract_dates, flag_obligations
ID Card	extract_id_info, verify_age
Passport	store_passport_info, check_validity
Bank Statement	categorize_transactions, generate_report
Driver License	store_license_info, check_expiry
Tax Form	summarize_tax, suggest_deductions
General	summarize, translate, search_keywords

Configuration

Required environment variables:

PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

中文: docs/README.zh.md
English: docs/README.en.md
Español: docs/README.es.md
العربية: docs/README.ar.md

License

MIT-0

Made for OpenClaw. Let your agent see and act.

Usage Guidance

This skill appears coherent for calling a PaddleOCR cloud service. Before installing: (1) only set PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN if you trust the endpoint—documents and the token will be sent there; prefer your provider's official endpoint or a self-hosted instance if handling sensitive docs; (2) inspect scripts/requirements.txt and the scripts (notably scripts/ocr_engine.py) to confirm no unexpected network calls or logging of tokens; (3) run pip installs in an isolated environment (venv/container) and ensure poppler is the official package; (4) avoid processing highly sensitive documents until you confirm the endpoint and token policies; (5) verify rate limits, data retention, and token scope with the PaddleOCR provider.

Capability Analysis

Type: OpenClaw Skill Name: agent-paddleocr-vision Version: 1.1.0 The agent-paddleocr-vision skill bundle is a legitimate tool for document OCR and analysis using the PaddleOCR cloud API. It includes robust logic for document classification (classify.py), structured data extraction (actions.py), and the generation of searchable PDFs (make_searchable_pdf.py). The code uses standard libraries like httpx for API communication and follows best practices by requiring user-configured environment variables for sensitive credentials. No evidence of malicious intent, data exfiltration, or prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description, required binaries (python), and required env vars (PADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TOKEN) align with a cloud-OCR integration. Declared functionality (searchable PDF, classification, suggested actions) matches the included scripts.

✓ Instruction Scope

SKILL.md and examples limit runtime activity to calling the PaddleOCR parsing endpoint, parsing OCR results, generating searchable PDFs, and writing outputs. There are no instructions to read unrelated system files or other credentials. The agent is expected to send user documents to the configured API endpoint (as intended).

ℹ Install Mechanism

The registry provides no automated install spec (instruction-only), but SKILL.md tells users to pip install -r scripts/requirements.txt and install system packages (poppler). Installing Python packages pulls code from PyPI which is routine but carries the usual supply-chain risk; inspect scripts/requirements.txt before running pip as a best practice.

✓ Credentials

Only two environment variables are required: the PaddleOCR API URL and access token (primary credential). Both are necessary for a cloud-OCR integration. No unrelated secrets, system config paths, or extra credentials are requested.

✓ Persistence & Privilege

The skill is not forced-always, does not request persistent elevated privileges, and does not modify other skills' configs. It runs as an invoked tool and writes outputs (JSON, PDFs) to disk as expected.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install agent-paddleocr-vision
After installation, invoke the skill by name or use /agent-paddleocr-vision
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.0

- Documentation moved to the new docs/ directory with multi-language support (Arabic, English, Spanish, Chinese). - Removed template files for document types (e.g., bank_statement, business_card, invoice, etc.). - Cleaned up project structure by deleting unused and redundant files. - README and integration details now consolidated and easier to navigate.

v1.0.1

No file changes detected for version 1.0.1. - No updates or modifications from the previous version. - Functionality, documentation, and configuration remain unchanged.

v1.0.0

Initial release of agent-paddleocr-vision. - Multi-language OCR extraction using PaddleOCR cloud API. - Supports 11 document types with automatic classification and tailored action suggestions. - Generates searchable PDFs with accurate layout and bounding boxes. - Batch processing capabilities for folders of documents. - Structured output with integration instructions and multi-language documentation. - Requires API credentials set via environment variables.

Metadata

Slug agent-paddleocr-vision

Version 1.1.0

License MIT-0

All-time Installs 2

Active Installs 2

Total Versions 3

Frequently Asked Questions

What is Agent Paddleocr Vision?

Multi-language document understanding with PaddleOCR. It is an AI Agent Skill for Claude Code / OpenClaw, with 273 downloads so far.

How do I install Agent Paddleocr Vision?

Run "/install agent-paddleocr-vision" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Paddleocr Vision free?

Yes, Agent Paddleocr Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Agent Paddleocr Vision support?

Agent Paddleocr Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Paddleocr Vision?

It is built and maintained by Allen Niu (@nhzallen); the current version is v1.1.0.

More Skills