← 返回 Skills 市场

Agent Paddleocr Vision

Name: Agent Paddleocr Vision
Author: nhzallen

作者 Allen Niu · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ✓ 安全检测通过

273

总下载

当前安装

版本数

在 OpenClaw 中安装

/install agent-paddleocr-vision

功能描述

Multi-language document understanding with PaddleOCR

使用说明 (SKILL.md)

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

OCR extraction via PaddleOCR cloud API (requires credentials)
11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
Action suggestion with structured parameters
Batch processing
Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

Type	Actions
Invoice	create_expense, archive, tax_report
Business Card	add_contact, save_vcard
Receipt	create_expense, split_bill
Table	export_csv, analyze_data
Contract	summarize, extract_dates, flag_obligations
ID Card	extract_id_info, verify_age
Passport	store_passport_info, check_validity
Bank Statement	categorize_transactions, generate_report
Driver License	store_license_info, check_expiry
Tax Form	summarize_tax, suggest_deductions
General	summarize, translate, search_keywords

Configuration

Required environment variables:

PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

中文: docs/README.zh.md
English: docs/README.en.md
Español: docs/README.es.md
العربية: docs/README.ar.md

License

MIT-0

Made for OpenClaw. Let your agent see and act.

安全使用建议

This skill appears coherent for calling a PaddleOCR cloud service. Before installing: (1) only set PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN if you trust the endpoint—documents and the token will be sent there; prefer your provider's official endpoint or a self-hosted instance if handling sensitive docs; (2) inspect scripts/requirements.txt and the scripts (notably scripts/ocr_engine.py) to confirm no unexpected network calls or logging of tokens; (3) run pip installs in an isolated environment (venv/container) and ensure poppler is the official package; (4) avoid processing highly sensitive documents until you confirm the endpoint and token policies; (5) verify rate limits, data retention, and token scope with the PaddleOCR provider.

功能分析

Type: OpenClaw Skill Name: agent-paddleocr-vision Version: 1.1.0 The agent-paddleocr-vision skill bundle is a legitimate tool for document OCR and analysis using the PaddleOCR cloud API. It includes robust logic for document classification (classify.py), structured data extraction (actions.py), and the generation of searchable PDFs (make_searchable_pdf.py). The code uses standard libraries like httpx for API communication and follows best practices by requiring user-configured environment variables for sensitive credentials. No evidence of malicious intent, data exfiltration, or prompt injection was found.

能力评估

✓ Purpose & Capability

Name/description, required binaries (python), and required env vars (PADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TOKEN) align with a cloud-OCR integration. Declared functionality (searchable PDF, classification, suggested actions) matches the included scripts.

✓ Instruction Scope

SKILL.md and examples limit runtime activity to calling the PaddleOCR parsing endpoint, parsing OCR results, generating searchable PDFs, and writing outputs. There are no instructions to read unrelated system files or other credentials. The agent is expected to send user documents to the configured API endpoint (as intended).

ℹ Install Mechanism

The registry provides no automated install spec (instruction-only), but SKILL.md tells users to pip install -r scripts/requirements.txt and install system packages (poppler). Installing Python packages pulls code from PyPI which is routine but carries the usual supply-chain risk; inspect scripts/requirements.txt before running pip as a best practice.

✓ Credentials

Only two environment variables are required: the PaddleOCR API URL and access token (primary credential). Both are necessary for a cloud-OCR integration. No unrelated secrets, system config paths, or extra credentials are requested.

✓ Persistence & Privilege

The skill is not forced-always, does not request persistent elevated privileges, and does not modify other skills' configs. It runs as an invoked tool and writes outputs (JSON, PDFs) to disk as expected.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install agent-paddleocr-vision
安装完成后，直接呼叫该 Skill 的名称或使用 /agent-paddleocr-vision 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

- Documentation moved to the new docs/ directory with multi-language support (Arabic, English, Spanish, Chinese). - Removed template files for document types (e.g., bank_statement, business_card, invoice, etc.). - Cleaned up project structure by deleting unused and redundant files. - README and integration details now consolidated and easier to navigate.

v1.0.1

No file changes detected for version 1.0.1. - No updates or modifications from the previous version. - Functionality, documentation, and configuration remain unchanged.

v1.0.0

Initial release of agent-paddleocr-vision. - Multi-language OCR extraction using PaddleOCR cloud API. - Supports 11 document types with automatic classification and tailored action suggestions. - Generates searchable PDFs with accurate layout and bounding boxes. - Batch processing capabilities for folders of documents. - Structured output with integration instructions and multi-language documentation. - Requires API credentials set via environment variables.

元数据

Slug agent-paddleocr-vision

版本 1.1.0

许可证 MIT-0

累计安装 2

当前安装数 2

历史版本数 3

常见问题