← Back to Skills Marketplace
paddleocr-vl-locally
by
sfresurgam
· GitHub ↗
· v1.0.2
· MIT-0
288
Downloads
0
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install paddleocr-vl-locally
Description
Complex document parsing with PaddleOCR. Intelligently converts complex PDFs and document images into Markdown and JSON files that preserve the original stru...
Usage Guidance
Things to check before installing or running this skill:
- Confirm environment variables: the registry lists only PADDLEOCR_DOC_PARSING_API_URL, but the code can also read PADDLEOCR_ACCESS_TOKEN, PADDLEOCR_BASIC_AUTH_USER, PADDLEOCR_BASIC_AUTH_PASSWORD, and PADDLEOCR_DOC_PARSING_TIMEOUT. If you will provide tokens/passwords, treat them as sensitive and verify the skill truly needs them.
- Understand data exposure: the SKILL.md mandates showing the COMPLETE extracted content (all text, tables, formulas). If you plan to parse sensitive documents, this behavior can leak secrets or private information. Consider whether you want the agent to automatically reveal full outputs or prefer truncation/summarization/approval steps.
- File persistence: results are saved by default under the system temp directory. Decide if that is acceptable; if not, use --stdout or a secure output path and remove temp files after processing.
- Inspect and test locally: because the skill is script-based (no automatic install), review the included scripts (vl_caller.py, lib.py) and run smoke_test.py (or --skip-api-test) in a controlled environment. The socket/URL you configure for PADDLEOCR_DOC_PARSING_API_URL should be trusted (local or internal endpoint preferred).
- Operational advice: restrict the API URL to an internal host if possible, rotate tokens used by the skill, and avoid enabling this skill for autonomous runs against sensitive data until you are comfortable with its behavior.
If you want higher assurance, ask the author to: (1) list all environment variables in the skill metadata, (2) make the 'display full content' behavior opt-in, and (3) add an option to avoid writing results to disk by default.
Capability Analysis
Type: OpenClaw Skill
Name: paddleocr-vl-locally
Version: 1.0.2
The skill bundle is a legitimate tool for document parsing via a PaddleOCR Triton Inference Server. The Python scripts (vl_caller.py, lib.py) implement standard API interaction using httpx, while utility scripts (optimize_file.py, split_pdf.py) provide helper functions for image compression and PDF page extraction. No evidence of malicious behavior, data exfiltration, or harmful prompt injection was found; the SKILL.md instructions correctly guide the agent on tool usage, error handling, and environment configuration.
Capability Assessment
Purpose & Capability
Name/description align with the code: the scripts call a document-parsing API (Triton/PaddleOCR-style) and provide helpers to optimize/split files and save JSON results. Required binary (python) and the primary env var (PADDLEOCR_DOC_PARSING_API_URL) are appropriate. The presence of helper scripts (optimize_file.py, split_pdf.py) is consistent with supporting large/complex documents.
Instruction Scope
SKILL.md instructs the agent to ALWAYS use the external PaddleOCR Document Parsing API and NEVER parse locally (which is consistent with the code that sends files/URLs to the API). However, the SKILL.md also mandates displaying COMPLETE extracted content to the user and instructs the agent to read saved JSON files from the system temp directory before responding. These instructions broaden the agent's data exposure (showing full document text/tables/formulas without truncation) and require file I/O. The 'MANDATORY RESTRICTIONS' language is unusually prescriptive for an agent and could lead to indiscriminate disclosure of sensitive content.
Install Mechanism
There is no automated install spec (lower risk), but SKILL.md tells users to pip install dependencies from scripts/requirements*.txt. That is expected for a Python CLI skill. The requirements are minimal (httpx, optional Pillow/pypdfium2) and come from PyPI; no external/untrusted download URLs are used.
Credentials
Registry metadata declares only PADDLEOCR_DOC_PARSING_API_URL as a required env var, but the code actually reads additional environment variables (PADDLEOCR_ACCESS_TOKEN, PADDLEOCR_BASIC_AUTH_USER, PADDLEOCR_BASIC_AUTH_PASSWORD, PADDLEOCR_DOC_PARSING_TIMEOUT). Those optional credentials are plausible for authenticating to a proxied Triton server, but the omission from the declared requires.env is an inconsistency that should be clarified before trusting the skill with secrets.
Persistence & Privilege
The skill writes results to the system temp directory by default and prints the saved absolute path to stderr (and SKILL.md instructs the agent to read the saved JSON before responding). Writing parsed full-document JSON to disk is expected for this tool, but it leaves persistent artifacts containing potentially sensitive data. The skill does not request elevated system privileges and always=false.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install paddleocr-vl-locally - After installation, invoke the skill by name or use
/paddleocr-vl-locally - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
No user-facing changes were detected in this release.
- Internal or metadata updates may have been made without affecting usage or documentation.
v1.0.1
- Skill now renamed to **paddleocr-vl-locally**.
- No longer requires `PADDLEOCR_ACCESS_TOKEN`; only `PADDLEOCR_DOC_PARSING_API_URL` is needed.
- Instructions updated for local deployment: configure the API URL to your local Triton inference endpoint.
- Simplified configuration guidance and clarified that access tokens are not required for local use.
v1.0.0
Initial release of PaddleOCR Document Parsing Skill.
- Enables advanced document parsing using the PaddleOCR Document Parsing API.
- Converts complex PDFs and document images into structured Markdown and JSON, preserving original layout (tables, formulas, charts, multi-column, etc.).
- Provides clear usage instructions: only interacts via the official API/script and never performs parsing directly.
- Returns complete, unabridged document content as requested (text, tables, formulas, etc.); does not summarize or truncate unless output is extremely long.
- Handles errors transparently and guides users on secure API and token configuration.
- Supports both URL and local file input, with customizable output modes (file, stdout).
- Emphasizes extraction completeness, structured metadata, and consistent output behavior.
- PaddleOCR-VL service adapted for localized deployment
Metadata
Frequently Asked Questions
What is paddleocr-vl-locally?
Complex document parsing with PaddleOCR. Intelligently converts complex PDFs and document images into Markdown and JSON files that preserve the original stru... It is an AI Agent Skill for Claude Code / OpenClaw, with 288 downloads so far.
How do I install paddleocr-vl-locally?
Run "/install paddleocr-vl-locally" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is paddleocr-vl-locally free?
Yes, paddleocr-vl-locally is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does paddleocr-vl-locally support?
paddleocr-vl-locally is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created paddleocr-vl-locally?
It is built and maintained by sfresurgam (@sfresurgam); the current version is v1.0.2.
More Skills