← Back to Skills Marketplace

pdf-ocr

Name: pdf-ocr
Author: yejinlei

by yejinlei · GitHub ↗ · v2.2.0 · MIT-0

cross-platform ⚠ suspicious

491

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install pdf-ocr-skill

Description

支持双引擎的PDF OCR识别技能，可从影印版PDF文件和图片文件中提取文字内容

Usage Guidance

This skill appears to implement the advertised OCR functionality, but review these before installing: - The registry metadata omits required env vars but SKILL.md/.env expect SILICON_FLOW_API_KEY for the cloud engine — treat the cloud engine as requiring a secret key. - The code will auto-install Python packages with pip at runtime (subprocess pip install). That can change your environment and pull code from PyPI; prefer installing dependencies yourself in a virtualenv or review required packages and versions first. - If you enable the cloud engine, the skill uploads full images (base64) to https://api.siliconflow.cn — do not use the cloud engine for sensitive documents unless you trust the service and the API key handling. Consider running RapidOCR (local) only for private data. - Verify the vendor/source (homepage is missing and source is 'unknown'). If you need to trust this skill long-term, obtain it from a known repository or author, inspect the full code (including the truncated parts) and test in a sandbox environment. If you want to proceed safely: run the skill in an isolated environment (virtualenv/container), manually install and pin dependencies from requirements.txt, avoid configuring the cloud API key unless necessary, and audit network calls/logging to ensure no unexpected endpoints receive your data.

Capability Analysis

Type: OpenClaw Skill Name: pdf-ocr-skill Version: 2.2.0 The skill bundle contains a high-risk behavior in `scripts/pdf_ocr_processor.py`, where it defines an `install_dependency` function that uses `subprocess.check_call` to automatically execute `pip install` for missing libraries at runtime. While the currently hardcoded packages (rapidocr_onnxruntime, pymupdf, pillow) are legitimate, this pattern of auto-installing dependencies is a common vector for supply chain risks and unauthorized code execution. The rest of the bundle, including the prompt instructions in `SKILL.md` and the integration with the SiliconFlow API (api.siliconflow.cn), appears consistent with its stated purpose as an OCR utility.

Capability Assessment

⚠ Purpose & Capability

Name/description, SKILL.md and the included Python code are coherent: they implement a PDF/image OCR processor with a local engine (RapidOCR) and an optional cloud engine (SiliconFlow). However the registry metadata declares no required environment variables or credentials while the SKILL.md and code clearly expect an optional SILICON_FLOW_API_KEY for the cloud engine — this metadata omission is an inconsistency that reduces transparency.

ℹ Instruction Scope

SKILL.md and examples stick to OCR tasks (convert PDF→images, run OCR, save text). They instruct providing an API key when using the cloud engine. They do not instruct reading unrelated system files. One area to note: the skill will send full image data (base64) to the external siliconflow API when that engine is used — this is expected for cloud OCR but is sensitive (images may contain private data) and the docs do not strongly call out privacy/exfiltration implications.

⚠ Install Mechanism

There is no install spec in the registry (instruction-only), but the runtime code will attempt to auto-install missing Python packages by invoking pip via subprocess at runtime. Auto-installing packages during execution can modify the runtime environment and pull arbitrary code from PyPI — this increases risk compared with a purely instruction-only skill that requires manual dependency installation.

⚠ Credentials

The skill only needs one service credential in practice (SILICON_FLOW_API_KEY) for the optional cloud engine, which is proportionate. However the registry declared no required env vars while the SKILL.md and .env.example explicitly document SILICON_FLOW_API_KEY and OCR_ENGINE. The lack of declared credentials in metadata reduces transparency. Also sending base64 image data to api.siliconflow.cn is a sensitive operation that you should only enable if you trust that service and key usage.

✓ Persistence & Privilege

Skill flags are default: not always-on, user-invocable, and allows autonomous invocation (platform default). The package does not request elevated system privileges or attempt to modify other skills or global agent settings in the provided files.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install pdf-ocr-skill
After installation, invoke the skill by name or use /pdf-ocr-skill
Provide required inputs per the skill's parameter spec and get structured output

Version History

v2.2.0

- 新增支持双 OCR 引擎，适配 RapidOCR（本地）与硅基流动 API（云端）。 - 增强自动引擎切换：RapidOCR 初始化失败时自动切换到硅基流动 API。 - 支持多种图片格式的文字识别，扩展支持 JPG、PNG、BMP、GIF、TIFF、WEBP。 - 完善使用文档，新增命令行与批量处理示例。 - 增加用户交互提示词，方便通过助手指定 OCR 引擎。 - 更新故障排除指引，帮助定位常见问题。

Metadata

Slug pdf-ocr-skill

Version 2.2.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is pdf-ocr?

支持双引擎的PDF OCR识别技能，可从影印版PDF文件和图片文件中提取文字内容. It is an AI Agent Skill for Claude Code / OpenClaw, with 491 downloads so far.

How do I install pdf-ocr?

Run "/install pdf-ocr-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is pdf-ocr free?

Yes, pdf-ocr is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does pdf-ocr support?

pdf-ocr is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created pdf-ocr?

It is built and maintained by yejinlei (@yejinlei); the current version is v2.2.0.

More Skills