← 返回 Skills 市场
🔌

PDF to Markdown with OCR

作者 speech2srt · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ 安全检测通过
85
总下载
1
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install ocr2markdown
功能描述
Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text...
安全使用建议
This skill appears to implement exactly what it claims: it uploads local PDFs to Modal volumes, runs an OCR pipeline (mineru) on a remote GPU image, and downloads Markdown outputs. Before installing: (1) ensure you trust the mineru package and the container image (it will pip-install mineru inside the remote image); (2) understand that it will create/use Modal volumes named speech2srt-data and speech2srt-models in your Modal account — these are shared/account-level resources and may already contain or be used for other data; (3) the pipeline symlinks the runtime ~/.cache into the models volume (it will remove an existing cache directory in the runtime), so check for collisions with any existing cached content you care about; (4) the skill requires a Modal account and may consume paid GPU credits, so verify billing/credits before running. If you need stronger isolation, change the volume names and review the image/pip packages used.
功能分析
Type: OpenClaw Skill Name: ocr2markdown Version: 1.0.1 The skill bundle implements a legitimate OCR pipeline using the 'mineru' library on the Modal serverless platform. The code in 'src/ocr2markdown.py' and 'src/images.py' follows standard Modal patterns for GPU-accelerated workloads, including symlinking cache directories to persistent volumes for model storage. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the instructions in 'SKILL.md' are consistent with the stated purpose of processing PDF files.
能力评估
Purpose & Capability
The name/description (PDF/image → Markdown via Modal L4 GPU) matches the code and SKILL.md. The code invokes a mineru CLI inside a Modal image to perform OCR, uses Modal volumes to move files, and exposes a function to run the pipeline. The included dependencies (mineru, OpenCV in the container image) are appropriate for OCR and layout extraction.
Instruction Scope
Runtime instructions operate on local PDF/image files uploaded to Modal volumes and download processed output back — this matches the skill purpose. A noteworthy behavior: the pipeline symlinks the process's ~/.cache to the mounted models volume (removing any existing cache directory first) so model caches are stored on the volume. Also, the volumes used have generic names (speech2srt-data / speech2srt-models) — these are global within the Modal account and could lead to data sharing or collisions with other projects that reuse the same volume names.
Install Mechanism
There is no direct 'install' script in the registry spec; the pipeline relies on a Modal container image (vllm/vllm-openai) and runs pip to install mineru and opencv inside that image. This is a common pattern for remote container jobs and does not involve downloads from obscure/personal URLs or URL shorteners.
Credentials
The skill does not request environment variables or external credentials in the registry metadata. Inside the Modal image it sets benign env vars (e.g., MINERU_MODEL_SOURCE). The potential issue to be aware of: mineru may download models from Hugging Face; if private models are needed a HF token would be required but is not requested by the skill. Also, the shared volume names (speech2srt-*) mean the skill will read/write to account-wide volumes — consider whether those volumes already contain sensitive data or are used by other pipelines.
Persistence & Privilege
always is false and the skill does not modify other skills' configurations. It defines a Modal App name (speech2srt.com) and creates/uses volumes in the user's Modal account, which is expected for Modal-based workloads and does not itself grant elevated platform privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ocr2markdown
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ocr2markdown 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Added version field (v1.0.1) to the skill manifest. - No other changes; functionality and workflow remain the same.
v1.0.0
Initial release of ocr2markdown skill for document OCR and PDF/image to Markdown conversion. - Converts PDF and image files to Markdown while preserving layout, tables, formulas, and OCR data. - Utilizes a remote Modal L4 GPU for efficient processing of large documents. - Supports multi-file workflows: allows directory scanning, user file selection, and batch processing. - Outputs organized results, including Markdown files and extracted images, ready for local download. - Includes clear setup and usage instructions for seamless onboarding and operation.
元数据
Slug ocr2markdown
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

PDF to Markdown with OCR 是什么?

Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 85 次。

如何安装 PDF to Markdown with OCR?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ocr2markdown」即可一键安装,无需额外配置。

PDF to Markdown with OCR 是免费的吗?

是的,PDF to Markdown with OCR 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

PDF to Markdown with OCR 支持哪些平台?

PDF to Markdown with OCR 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 PDF to Markdown with OCR?

由 speech2srt(@speech2srt)开发并维护,当前版本 v1.0.1。

💬 留言讨论