← Back to Skills Marketplace
🔌

PDF to Markdown with OCR

by speech2srt · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
85
Downloads
1
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install ocr2markdown
Description
Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text...
Usage Guidance
This skill appears to implement exactly what it claims: it uploads local PDFs to Modal volumes, runs an OCR pipeline (mineru) on a remote GPU image, and downloads Markdown outputs. Before installing: (1) ensure you trust the mineru package and the container image (it will pip-install mineru inside the remote image); (2) understand that it will create/use Modal volumes named speech2srt-data and speech2srt-models in your Modal account — these are shared/account-level resources and may already contain or be used for other data; (3) the pipeline symlinks the runtime ~/.cache into the models volume (it will remove an existing cache directory in the runtime), so check for collisions with any existing cached content you care about; (4) the skill requires a Modal account and may consume paid GPU credits, so verify billing/credits before running. If you need stronger isolation, change the volume names and review the image/pip packages used.
Capability Analysis
Type: OpenClaw Skill Name: ocr2markdown Version: 1.0.1 The skill bundle implements a legitimate OCR pipeline using the 'mineru' library on the Modal serverless platform. The code in 'src/ocr2markdown.py' and 'src/images.py' follows standard Modal patterns for GPU-accelerated workloads, including symlinking cache directories to persistent volumes for model storage. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the instructions in 'SKILL.md' are consistent with the stated purpose of processing PDF files.
Capability Assessment
Purpose & Capability
The name/description (PDF/image → Markdown via Modal L4 GPU) matches the code and SKILL.md. The code invokes a mineru CLI inside a Modal image to perform OCR, uses Modal volumes to move files, and exposes a function to run the pipeline. The included dependencies (mineru, OpenCV in the container image) are appropriate for OCR and layout extraction.
Instruction Scope
Runtime instructions operate on local PDF/image files uploaded to Modal volumes and download processed output back — this matches the skill purpose. A noteworthy behavior: the pipeline symlinks the process's ~/.cache to the mounted models volume (removing any existing cache directory first) so model caches are stored on the volume. Also, the volumes used have generic names (speech2srt-data / speech2srt-models) — these are global within the Modal account and could lead to data sharing or collisions with other projects that reuse the same volume names.
Install Mechanism
There is no direct 'install' script in the registry spec; the pipeline relies on a Modal container image (vllm/vllm-openai) and runs pip to install mineru and opencv inside that image. This is a common pattern for remote container jobs and does not involve downloads from obscure/personal URLs or URL shorteners.
Credentials
The skill does not request environment variables or external credentials in the registry metadata. Inside the Modal image it sets benign env vars (e.g., MINERU_MODEL_SOURCE). The potential issue to be aware of: mineru may download models from Hugging Face; if private models are needed a HF token would be required but is not requested by the skill. Also, the shared volume names (speech2srt-*) mean the skill will read/write to account-wide volumes — consider whether those volumes already contain sensitive data or are used by other pipelines.
Persistence & Privilege
always is false and the skill does not modify other skills' configurations. It defines a Modal App name (speech2srt.com) and creates/uses volumes in the user's Modal account, which is expected for Modal-based workloads and does not itself grant elevated platform privileges.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ocr2markdown
  3. After installation, invoke the skill by name or use /ocr2markdown
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Added version field (v1.0.1) to the skill manifest. - No other changes; functionality and workflow remain the same.
v1.0.0
Initial release of ocr2markdown skill for document OCR and PDF/image to Markdown conversion. - Converts PDF and image files to Markdown while preserving layout, tables, formulas, and OCR data. - Utilizes a remote Modal L4 GPU for efficient processing of large documents. - Supports multi-file workflows: allows directory scanning, user file selection, and batch processing. - Outputs organized results, including Markdown files and extracted images, ready for local download. - Includes clear setup and usage instructions for seamless onboarding and operation.
Metadata
Slug ocr2markdown
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is PDF to Markdown with OCR?

Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text... It is an AI Agent Skill for Claude Code / OpenClaw, with 85 downloads so far.

How do I install PDF to Markdown with OCR?

Run "/install ocr2markdown" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PDF to Markdown with OCR free?

Yes, PDF to Markdown with OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PDF to Markdown with OCR support?

PDF to Markdown with OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PDF to Markdown with OCR?

It is built and maintained by speech2srt (@speech2srt); the current version is v1.0.1.

💬 Comments