← Back to Skills Marketplace
openclawzhangchong

pdf-ocr-byzhangchong

by 张翀 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
75
Downloads
1
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install pdf-ocr-zc
Description
批量 OCR 处理扫描 PDF,自动生成带文字层的 PDF 并可导出为 Markdown/纯文本。使用场景包括老师 Agent 需要将大量扫描教材 PDF 转化为可检索文本。
README (SKILL.md)

PDF OCR 处理技能

何时使用

  • 需要对大量扫描件 PDF 进行文字识别(OCR)
  • 希望直接得到可搜索的 PDF(文字层)或提取的纯文本/Markdown
  • 需要在老师 Agent 工作流中自动化该步骤

基本使用方式

# 运行一次 OCR(需要已安装 Tesseract 与 ocrmypdf)
openclaw exec python skills/pdf-ocr/scripts/ocr_batch.py \x3Cinput-pdf> \x3Coutput-pdf>
  • \x3Cinput-pdf>:原始扫描 PDF 路径
  • \x3Coutput-pdf>:输出带文字层的 PDF(同目录或指定路径)

高级选项

  • 若想一次性处理目录下所有 PDF,使用 --batch-dir 参数:
openclaw exec python skills/pdf-ocr/scripts/ocr_batch.py --batch-dir \x3Cpdf-dir>
  • 可加 --lang chi_sim 指定中文简体模型(默认 tesseract 会自动检测语言)

脚本说明 (scripts/ocr_batch.py)

  • 检测并确保 ocrmypdf 可用;如未安装会提示安装指令
  • 使用 ocrmypdf 完成 OCR,内部调用已装好的 Tesseract
  • 支持批量目录模式,遍历 *.pdf 并生成对应带文字层文件
  • 错误会记录到 logs/pdf_ocr_error.log,便于排查

参考资源

  • references/ocr_tips.md:常见 OCR 参数调优技巧(如 DPI、图片预处理)
  • references/install_ocr.md:在 Windows 上安装 Tesseract 与 ocrmypdf 的详细步骤

与老师 Agent 的集成

在老师 Agent 的工作流(如 auto_ingest)中,可在 HEARTBEAT.md 或 cron 中加入如下调用,以实现每日自动 OCR:

openclaw exec python skills/pdf-ocr/scripts/ocr_batch.py --batch-dir /path/to/teacher-pdfs

这样老师 Agent 在 ingest 前就已拥有文字层,后续向量化、检索都能顺畅进行。


使用示例

  1. 单文件 OCR:
openclaw exec python skills/pdf-ocr/scripts/ocr_batch.py D:\docs\scan1.pdf D:\docs\scan1_text.pdf
  1. 批量目录 OCR:
openclaw exec python skills/pdf-ocr/scripts/ocr_batch.py --batch-dir D:	eacher-pdfs

如需更细粒度的文本(Markdown),可在脚本后接 pdf2txt.py 转换。


注意:此技能仅在本机执行,不会触发外部网络请求,符合安全策略。

Usage Guidance
This skill appears to do exactly what it says: run ocrmypdf/Tesseract locally to add a text layer to PDFs and extract text. Before installing or enabling automated runs, consider the following: 1) Install ocrmypdf and Tesseract from trusted sources and verify their checksums or official release pages. 2) Run the script on copies of important PDFs first to avoid accidental overwrites; the script writes _ocr.pdf files next to inputs by default. 3) Ensure the system PATH points to the genuine ocrmypdf/tesseract binaries (PATH hijacking is a general risk when running subprocesses). 4) If you plan to schedule it (cron/HEARTBEAT), limit the watched directory to only the PDFs that should be processed and ensure the agent account has only the necessary filesystem permissions. 5) Review logs/log path (logs/pdf_ocr_error.log) and monitor disk usage for large batches. If you want extra isolation, run OCR jobs in a sandbox/container or a dedicated user account.
Capability Analysis
Type: OpenClaw Skill Name: pdf-ocr-zc Version: 1.0.0 The skill bundle provides a legitimate utility for batch OCR processing of PDF files using the 'ocrmypdf' library. The core logic in 'scripts/ocr_batch.py' uses safe subprocess calls with argument lists to avoid shell injection and contains no evidence of data exfiltration, persistence, or malicious network activity. The documentation in 'SKILL.md' and 'references/' is consistent with the stated purpose and points to reputable external tools (Tesseract) for installation.
Capability Assessment
Purpose & Capability
Name/description (batch OCR -> searchable PDFs / text/Markdown) match the included script and docs. The script calls ocrmypdf/Tesseract (expected for OCR) and includes sensible CLI options. No unrelated credentials, binaries, or configuration paths are requested.
Instruction Scope
SKILL.md limits actions to local OCR processing and shows expected commands. The script only invokes local binaries (ocrmypdf/Tesseract), traverses an input directory when asked, and writes outputs and logs locally. Caution: it executes external binaries found on PATH (ocrmypdf/tesseract) — if those binaries were replaced or malicious, the skill would run them. Also, integrating the script into HEARTBEAT/cron grants regular automated access to whatever directories are configured.
Install Mechanism
There is no install spec; the skill is instruction-only with a small helper script. The references recommend installing Tesseract (from a GitHub repo) and ocrmypdf via pip — reasonable and traceable guidance. No downloads from obscure URLs or archive extraction are present.
Credentials
The skill requests no environment variables, credentials, or config paths. This is proportionate for a local OCR utility. Note: it relies on PATH for required binaries, so PATH integrity matters but no secrets are requested/exfiltrated by the code.
Persistence & Privilege
The skill does not force persistent installation (always:false) and is user-invocable. The docs suggest adding the script to scheduled workflows (cron/HEARTBEAT), which is a design choice — scheduling gives regular file access but is not performed by the skill itself. Autonomous model invocation is allowed by default (normal) but not used in the files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install pdf-ocr-zc
  3. After installation, invoke the skill by name or use /pdf-ocr-zc
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- 初始版本发布:支持批量 OCR 处理扫描 PDF,生成可搜索文字层或导出为纯文本/Markdown。 - 脚本自动检测 ocrmypdf 环境,支持单文件和批量目录处理。 - 错误日志自动保存,便于问题排查。 - 适用于老师 Agent自动化教材 PDF 转文本的场景。 - 提供详细安装及参数调优参考文档。
Metadata
Slug pdf-ocr-zc
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is pdf-ocr-byzhangchong?

批量 OCR 处理扫描 PDF,自动生成带文字层的 PDF 并可导出为 Markdown/纯文本。使用场景包括老师 Agent 需要将大量扫描教材 PDF 转化为可检索文本。 It is an AI Agent Skill for Claude Code / OpenClaw, with 75 downloads so far.

How do I install pdf-ocr-byzhangchong?

Run "/install pdf-ocr-zc" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is pdf-ocr-byzhangchong free?

Yes, pdf-ocr-byzhangchong is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does pdf-ocr-byzhangchong support?

pdf-ocr-byzhangchong is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created pdf-ocr-byzhangchong?

It is built and maintained by 张翀 (@openclawzhangchong); the current version is v1.0.0.

💬 Comments