← Back to Skills Marketplace
jessy-huang

PaddleOCR-VL

by Jessy-Huang · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
0
Downloads
1
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install paddle-ocr-vl
Description
GPU-accelerated document parsing and OCR via PaddleOCR-VL. Detects layout, recognizes Chinese/English text, tables, charts, and seals in images. Use when the...
README (SKILL.md)

PaddleOCR-VL Skill

GPU-accelerated document OCR using the PaddleOCR-VL model running inside an ephemeral Docker container. Auto-detects NVIDIA GPU architecture (Blackwell SM120 vs. standard) and selects the correct official image.

When to Use

  • User provides an image path and asks to "read", "OCR", or "extract text" from it
  • User wants to parse a document screenshot, newspaper page, or classical text
  • User asks about the content of an image file

Architecture

This skill includes an MCP server (server.py) that exposes three tools:

Tool Purpose
run_ocr OCR any image — provide an absolute path
check_environment Verify Docker, GPU drivers, and image are ready
run_demo Run OCR on bundled demo images to test the setup

Setup

1. Install the MCP Server

Add to ~/.config/Claude/claude_desktop_config.json (Claude Desktop) or ~/.claude/settings.json (Claude Code):

{
  "mcpServers": {
    "paddle-ocr-vl": {
      "command": "python3",
      "args": ["\x3CINSTALL_DIR>/server.py"]
    }
  }
}

2. Pull the Docker Image (one-time)

# Blackwell GPU (RTX 50xx, B100/B200 — compute capability >= 12.0):
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu-sm120

# Other NVIDIA GPU:
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu

3. Verify

Call check_environment to verify everything is set up, then run_demo to test on the bundled sample images.

Bundled Demo Images

File Content
demo/newspaper.png People's Daily article about China-Eritrea relations
demo/classical_text.png Records of the Three Kingdoms, vertical classical Chinese

Requirements

  • Docker with nvidia-container-toolkit
  • NVIDIA GPU with drivers installed
  • Python 3.10+ with mcp SDK (pip install mcp)

Security & Privacy

  • Images are processed inside an ephemeral Docker container (--rm flag)
  • The container has no network access beyond --network host (needed for GPU)
  • No data leaves the host machine
  • The container is destroyed immediately after each OCR run

External Endpoints

None. All processing is local.

Official References

Usage Guidance
Review before installing. Only use this with trusted images and trusted file paths, and avoid running it on files with unusual or attacker-controlled filenames. Prefer a revised version that disables host networking, avoids root, mounts only the target file read-only, and safely passes the path into the container.
Capability Assessment
Purpose & Capability
Running PaddleOCR-VL in Docker matches the stated OCR purpose, and the exposed tools are limited to OCR, environment checking, and demos.
Instruction Scope
The security notes claim local/no-external behavior while the runtime uses host networking and an external Docker image, so users are not clearly informed about the actual network exposure.
Install Mechanism
Installation is manual MCP configuration plus one-time Docker image pull; no hidden installer or persistence mechanism was found.
Credentials
The container is launched with --network host, --user root, GPU access, and a read-write bind mount of the input file's directory, which is broader than ordinary OCR needs and materially expands blast radius.
Persistence & Privilege
The container is ephemeral, but it runs as root and mounts the host directory read-write; the inline Python command also embeds the image filename without escaping, creating a code-injection path for crafted filenames.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install paddle-ocr-vl
  3. After installation, invoke the skill by name or use /paddle-ocr-vl
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of PaddleOCR-VL skill: - GPU-accelerated document OCR using PaddleOCR-VL inside Docker. - Detects layout and recognizes Chinese/English text, tables, charts, and seals. - Supports vertical classical Chinese text, modern newspaper layouts, and mixed-content documents. - Provides tools to run OCR, check environment, and demo on sample images. - No external endpoints; all processing is local and containerized for privacy and security.
Metadata
Slug paddle-ocr-vl
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is PaddleOCR-VL?

GPU-accelerated document parsing and OCR via PaddleOCR-VL. Detects layout, recognizes Chinese/English text, tables, charts, and seals in images. Use when the... It is an AI Agent Skill for Claude Code / OpenClaw, with 0 downloads so far.

How do I install PaddleOCR-VL?

Run "/install paddle-ocr-vl" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PaddleOCR-VL free?

Yes, PaddleOCR-VL is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PaddleOCR-VL support?

PaddleOCR-VL is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PaddleOCR-VL?

It is built and maintained by Jessy-Huang (@jessy-huang); the current version is v1.0.0.

💬 Comments