← Back to Skills Marketplace

PaddleOCR-VL

Name: PaddleOCR-VL
Author: jessy-huang

by Jessy-Huang · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install paddle-ocr-vl

Description

GPU-accelerated document parsing and OCR via PaddleOCR-VL. Detects layout, recognizes Chinese/English text, tables, charts, and seals in images. Use when the...

README (SKILL.md)

PaddleOCR-VL Skill

GPU-accelerated document OCR using the PaddleOCR-VL model running inside an ephemeral Docker container. Auto-detects NVIDIA GPU architecture (Blackwell SM120 vs. standard) and selects the correct official image.

When to Use

User provides an image path and asks to "read", "OCR", or "extract text" from it
User wants to parse a document screenshot, newspaper page, or classical text
User asks about the content of an image file

Architecture

This skill includes an MCP server (server.py) that exposes three tools:

Tool	Purpose
`run_ocr`	OCR any image — provide an absolute path
`check_environment`	Verify Docker, GPU drivers, and image are ready
`run_demo`	Run OCR on bundled demo images to test the setup

Setup

1. Install the MCP Server

Add to ~/.config/Claude/claude_desktop_config.json (Claude Desktop) or ~/.claude/settings.json (Claude Code):

{
  "mcpServers": {
    "paddle-ocr-vl": {
      "command": "python3",
      "args": ["\x3CINSTALL_DIR>/server.py"]
    }
  }
}

2. Pull the Docker Image (one-time)

# Blackwell GPU (RTX 50xx, B100/B200 — compute capability >= 12.0):
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu-sm120

# Other NVIDIA GPU:
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu

3. Verify

Call check_environment to verify everything is set up, then run_demo to test on the bundled sample images.

Bundled Demo Images

File	Content
`demo/newspaper.png`	People's Daily article about China-Eritrea relations
`demo/classical_text.png`	Records of the Three Kingdoms, vertical classical Chinese

Requirements

Docker with nvidia-container-toolkit
NVIDIA GPU with drivers installed
Python 3.10+ with mcp SDK (pip install mcp)

Security & Privacy

Images are processed inside an ephemeral Docker container (--rm flag)
The container has no network access beyond --network host (needed for GPU)
No data leaves the host machine
The container is destroyed immediately after each OCR run

External Endpoints

None. All processing is local.

Official References

PaddleOCR-VL docs: https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL.html
Blackwell-specific: https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.html

Usage Guidance

Review before installing. Only use this with trusted images and trusted file paths, and avoid running it on files with unusual or attacker-controlled filenames. Prefer a revised version that disables host networking, avoids root, mounts only the target file read-only, and safely passes the path into the container.

Capability Assessment

ℹ Purpose & Capability

Running PaddleOCR-VL in Docker matches the stated OCR purpose, and the exposed tools are limited to OCR, environment checking, and demos.

⚠ Instruction Scope

The security notes claim local/no-external behavior while the runtime uses host networking and an external Docker image, so users are not clearly informed about the actual network exposure.

ℹ Install Mechanism

Installation is manual MCP configuration plus one-time Docker image pull; no hidden installer or persistence mechanism was found.

⚠ Credentials

The container is launched with --network host, --user root, GPU access, and a read-write bind mount of the input file's directory, which is broader than ordinary OCR needs and materially expands blast radius.

⚠ Persistence & Privilege

The container is ephemeral, but it runs as root and mounts the host directory read-write; the inline Python command also embeds the image filename without escaping, creating a code-injection path for crafted filenames.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install paddle-ocr-vl
After installation, invoke the skill by name or use /paddle-ocr-vl
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of PaddleOCR-VL skill: - GPU-accelerated document OCR using PaddleOCR-VL inside Docker. - Detects layout and recognizes Chinese/English text, tables, charts, and seals. - Supports vertical classical Chinese text, modern newspaper layouts, and mixed-content documents. - Provides tools to run OCR, check environment, and demo on sample images. - No external endpoints; all processing is local and containerized for privacy and security.

Metadata

Slug paddle-ocr-vl

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is PaddleOCR-VL?

GPU-accelerated document parsing and OCR via PaddleOCR-VL. Detects layout, recognizes Chinese/English text, tables, charts, and seals in images. Use when the... It is an AI Agent Skill for Claude Code / OpenClaw, with 0 downloads so far.

How do I install PaddleOCR-VL?

Run "/install paddle-ocr-vl" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PaddleOCR-VL free?

Yes, PaddleOCR-VL is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PaddleOCR-VL support?

PaddleOCR-VL is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PaddleOCR-VL?

It is built and maintained by Jessy-Huang (@jessy-huang); the current version is v1.0.0.

More Skills