← 返回 Skills 市场

🔌

PDF to Markdown with OCR

Name: PDF to Markdown with OCR
Author: speech2srt

作者 speech2srt · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install ocr2markdown

功能描述

Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text...

安全使用建议

This skill appears to implement exactly what it claims: it uploads local PDFs to Modal volumes, runs an OCR pipeline (mineru) on a remote GPU image, and downloads Markdown outputs. Before installing: (1) ensure you trust the mineru package and the container image (it will pip-install mineru inside the remote image); (2) understand that it will create/use Modal volumes named speech2srt-data and speech2srt-models in your Modal account — these are shared/account-level resources and may already contain or be used for other data; (3) the pipeline symlinks the runtime ~/.cache into the models volume (it will remove an existing cache directory in the runtime), so check for collisions with any existing cached content you care about; (4) the skill requires a Modal account and may consume paid GPU credits, so verify billing/credits before running. If you need stronger isolation, change the volume names and review the image/pip packages used.

功能分析

Type: OpenClaw Skill Name: ocr2markdown Version: 1.0.1 The skill bundle implements a legitimate OCR pipeline using the 'mineru' library on the Modal serverless platform. The code in 'src/ocr2markdown.py' and 'src/images.py' follows standard Modal patterns for GPU-accelerated workloads, including symlinking cache directories to persistent volumes for model storage. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the instructions in 'SKILL.md' are consistent with the stated purpose of processing PDF files.

能力评估

✓ Purpose & Capability

The name/description (PDF/image → Markdown via Modal L4 GPU) matches the code and SKILL.md. The code invokes a mineru CLI inside a Modal image to perform OCR, uses Modal volumes to move files, and exposes a function to run the pipeline. The included dependencies (mineru, OpenCV in the container image) are appropriate for OCR and layout extraction.

ℹ Instruction Scope

Runtime instructions operate on local PDF/image files uploaded to Modal volumes and download processed output back — this matches the skill purpose. A noteworthy behavior: the pipeline symlinks the process's ~/.cache to the mounted models volume (removing any existing cache directory first) so model caches are stored on the volume. Also, the volumes used have generic names (speech2srt-data / speech2srt-models) — these are global within the Modal account and could lead to data sharing or collisions with other projects that reuse the same volume names.

✓ Install Mechanism

There is no direct 'install' script in the registry spec; the pipeline relies on a Modal container image (vllm/vllm-openai) and runs pip to install mineru and opencv inside that image. This is a common pattern for remote container jobs and does not involve downloads from obscure/personal URLs or URL shorteners.

ℹ Credentials

The skill does not request environment variables or external credentials in the registry metadata. Inside the Modal image it sets benign env vars (e.g., MINERU_MODEL_SOURCE). The potential issue to be aware of: mineru may download models from Hugging Face; if private models are needed a HF token would be required but is not requested by the skill. Also, the shared volume names (speech2srt-*) mean the skill will read/write to account-wide volumes — consider whether those volumes already contain sensitive data or are used by other pipelines.

✓ Persistence & Privilege

always is false and the skill does not modify other skills' configurations. It defines a Modal App name (speech2srt.com) and creates/uses volumes in the user's Modal account, which is expected for Modal-based workloads and does not itself grant elevated platform privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install ocr2markdown
安装完成后，直接呼叫该 Skill 的名称或使用 /ocr2markdown 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Added version field (v1.0.1) to the skill manifest. - No other changes; functionality and workflow remain the same.

v1.0.0

Initial release of ocr2markdown skill for document OCR and PDF/image to Markdown conversion. - Converts PDF and image files to Markdown while preserving layout, tables, formulas, and OCR data. - Utilizes a remote Modal L4 GPU for efficient processing of large documents. - Supports multi-file workflows: allows directory scanning, user file selection, and batch processing. - Outputs organized results, including Markdown files and extracted images, ready for local download. - Includes clear setup and usage instructions for seamless onboarding and operation.

元数据

Slug ocr2markdown

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

PDF to Markdown with OCR 是什么？

Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 85 次。

如何安装 PDF to Markdown with OCR？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ocr2markdown」即可一键安装，无需额外配置。

PDF to Markdown with OCR 是免费的吗？

是的，PDF to Markdown with OCR 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

PDF to Markdown with OCR 支持哪些平台？

PDF to Markdown with OCR 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 PDF to Markdown with OCR？

由 speech2srt（@speech2srt）开发并维护，当前版本 v1.0.1。