← Back to Skills Marketplace

🔌

PDF to Markdown with OCR

Name: PDF to Markdown with OCR
Author: speech2srt

by speech2srt · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install ocr2markdown

Description

Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text...

Usage Guidance

This skill appears to implement exactly what it claims: it uploads local PDFs to Modal volumes, runs an OCR pipeline (mineru) on a remote GPU image, and downloads Markdown outputs. Before installing: (1) ensure you trust the mineru package and the container image (it will pip-install mineru inside the remote image); (2) understand that it will create/use Modal volumes named speech2srt-data and speech2srt-models in your Modal account — these are shared/account-level resources and may already contain or be used for other data; (3) the pipeline symlinks the runtime ~/.cache into the models volume (it will remove an existing cache directory in the runtime), so check for collisions with any existing cached content you care about; (4) the skill requires a Modal account and may consume paid GPU credits, so verify billing/credits before running. If you need stronger isolation, change the volume names and review the image/pip packages used.

Capability Analysis

Type: OpenClaw Skill Name: ocr2markdown Version: 1.0.1 The skill bundle implements a legitimate OCR pipeline using the 'mineru' library on the Modal serverless platform. The code in 'src/ocr2markdown.py' and 'src/images.py' follows standard Modal patterns for GPU-accelerated workloads, including symlinking cache directories to persistent volumes for model storage. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the instructions in 'SKILL.md' are consistent with the stated purpose of processing PDF files.

Capability Assessment

✓ Purpose & Capability

The name/description (PDF/image → Markdown via Modal L4 GPU) matches the code and SKILL.md. The code invokes a mineru CLI inside a Modal image to perform OCR, uses Modal volumes to move files, and exposes a function to run the pipeline. The included dependencies (mineru, OpenCV in the container image) are appropriate for OCR and layout extraction.

ℹ Instruction Scope

Runtime instructions operate on local PDF/image files uploaded to Modal volumes and download processed output back — this matches the skill purpose. A noteworthy behavior: the pipeline symlinks the process's ~/.cache to the mounted models volume (removing any existing cache directory first) so model caches are stored on the volume. Also, the volumes used have generic names (speech2srt-data / speech2srt-models) — these are global within the Modal account and could lead to data sharing or collisions with other projects that reuse the same volume names.

✓ Install Mechanism

There is no direct 'install' script in the registry spec; the pipeline relies on a Modal container image (vllm/vllm-openai) and runs pip to install mineru and opencv inside that image. This is a common pattern for remote container jobs and does not involve downloads from obscure/personal URLs or URL shorteners.

ℹ Credentials

The skill does not request environment variables or external credentials in the registry metadata. Inside the Modal image it sets benign env vars (e.g., MINERU_MODEL_SOURCE). The potential issue to be aware of: mineru may download models from Hugging Face; if private models are needed a HF token would be required but is not requested by the skill. Also, the shared volume names (speech2srt-*) mean the skill will read/write to account-wide volumes — consider whether those volumes already contain sensitive data or are used by other pipelines.

✓ Persistence & Privilege

always is false and the skill does not modify other skills' configurations. It defines a Modal App name (speech2srt.com) and creates/uses volumes in the user's Modal account, which is expected for Modal-based workloads and does not itself grant elevated platform privileges.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install ocr2markdown
After installation, invoke the skill by name or use /ocr2markdown
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

- Added version field (v1.0.1) to the skill manifest. - No other changes; functionality and workflow remain the same.

v1.0.0

Initial release of ocr2markdown skill for document OCR and PDF/image to Markdown conversion. - Converts PDF and image files to Markdown while preserving layout, tables, formulas, and OCR data. - Utilizes a remote Modal L4 GPU for efficient processing of large documents. - Supports multi-file workflows: allows directory scanning, user file selection, and batch processing. - Outputs organized results, including Markdown files and extracted images, ready for local download. - Includes clear setup and usage instructions for seamless onboarding and operation.

Metadata

Slug ocr2markdown

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is PDF to Markdown with OCR?

Document OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text... It is an AI Agent Skill for Claude Code / OpenClaw, with 85 downloads so far.

How do I install PDF to Markdown with OCR?

Run "/install ocr2markdown" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PDF to Markdown with OCR free?

Yes, PDF to Markdown with OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PDF to Markdown with OCR support?

PDF to Markdown with OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PDF to Markdown with OCR?

It is built and maintained by speech2srt (@speech2srt); the current version is v1.0.1.

More Skills