← 返回 Skills 市场
jessy-huang

PaddleOCR-VL

作者 Jessy-Huang · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
0
总下载
1
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install paddle-ocr-vl
功能描述
GPU-accelerated document parsing and OCR via PaddleOCR-VL. Detects layout, recognizes Chinese/English text, tables, charts, and seals in images. Use when the...
使用说明 (SKILL.md)

PaddleOCR-VL Skill

GPU-accelerated document OCR using the PaddleOCR-VL model running inside an ephemeral Docker container. Auto-detects NVIDIA GPU architecture (Blackwell SM120 vs. standard) and selects the correct official image.

When to Use

  • User provides an image path and asks to "read", "OCR", or "extract text" from it
  • User wants to parse a document screenshot, newspaper page, or classical text
  • User asks about the content of an image file

Architecture

This skill includes an MCP server (server.py) that exposes three tools:

Tool Purpose
run_ocr OCR any image — provide an absolute path
check_environment Verify Docker, GPU drivers, and image are ready
run_demo Run OCR on bundled demo images to test the setup

Setup

1. Install the MCP Server

Add to ~/.config/Claude/claude_desktop_config.json (Claude Desktop) or ~/.claude/settings.json (Claude Code):

{
  "mcpServers": {
    "paddle-ocr-vl": {
      "command": "python3",
      "args": ["\x3CINSTALL_DIR>/server.py"]
    }
  }
}

2. Pull the Docker Image (one-time)

# Blackwell GPU (RTX 50xx, B100/B200 — compute capability >= 12.0):
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu-sm120

# Other NVIDIA GPU:
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu

3. Verify

Call check_environment to verify everything is set up, then run_demo to test on the bundled sample images.

Bundled Demo Images

File Content
demo/newspaper.png People's Daily article about China-Eritrea relations
demo/classical_text.png Records of the Three Kingdoms, vertical classical Chinese

Requirements

  • Docker with nvidia-container-toolkit
  • NVIDIA GPU with drivers installed
  • Python 3.10+ with mcp SDK (pip install mcp)

Security & Privacy

  • Images are processed inside an ephemeral Docker container (--rm flag)
  • The container has no network access beyond --network host (needed for GPU)
  • No data leaves the host machine
  • The container is destroyed immediately after each OCR run

External Endpoints

None. All processing is local.

Official References

安全使用建议
Review before installing. Only use this with trusted images and trusted file paths, and avoid running it on files with unusual or attacker-controlled filenames. Prefer a revised version that disables host networking, avoids root, mounts only the target file read-only, and safely passes the path into the container.
能力评估
Purpose & Capability
Running PaddleOCR-VL in Docker matches the stated OCR purpose, and the exposed tools are limited to OCR, environment checking, and demos.
Instruction Scope
The security notes claim local/no-external behavior while the runtime uses host networking and an external Docker image, so users are not clearly informed about the actual network exposure.
Install Mechanism
Installation is manual MCP configuration plus one-time Docker image pull; no hidden installer or persistence mechanism was found.
Credentials
The container is launched with --network host, --user root, GPU access, and a read-write bind mount of the input file's directory, which is broader than ordinary OCR needs and materially expands blast radius.
Persistence & Privilege
The container is ephemeral, but it runs as root and mounts the host directory read-write; the inline Python command also embeds the image filename without escaping, creating a code-injection path for crafted filenames.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install paddle-ocr-vl
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /paddle-ocr-vl 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of PaddleOCR-VL skill: - GPU-accelerated document OCR using PaddleOCR-VL inside Docker. - Detects layout and recognizes Chinese/English text, tables, charts, and seals. - Supports vertical classical Chinese text, modern newspaper layouts, and mixed-content documents. - Provides tools to run OCR, check environment, and demo on sample images. - No external endpoints; all processing is local and containerized for privacy and security.
元数据
Slug paddle-ocr-vl
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

PaddleOCR-VL 是什么?

GPU-accelerated document parsing and OCR via PaddleOCR-VL. Detects layout, recognizes Chinese/English text, tables, charts, and seals in images. Use when the... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 0 次。

如何安装 PaddleOCR-VL?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install paddle-ocr-vl」即可一键安装,无需额外配置。

PaddleOCR-VL 是免费的吗?

是的,PaddleOCR-VL 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

PaddleOCR-VL 支持哪些平台?

PaddleOCR-VL 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 PaddleOCR-VL?

由 Jessy-Huang(@jessy-huang)开发并维护,当前版本 v1.0.0。

💬 留言讨论