← 返回 Skills 市场

PDF to Markdown - Extract Text, Tables, Formulas from PDF

Name: PDF to Markdown - Extract Text, Tables, Formulas from PDF
Author: tanis90

作者 tanis90 · GitHub ↗ · v1.0.4 · MIT-0

cross-platform ✓ 安全检测通过

365

总下载

当前安装

版本数

在 OpenClaw 中安装

/install pdftomd

功能描述

PDF to Markdown converter - extract text, tables and formulas from PDF files to clean Markdown. Use when converting PDF documents, extracting PDF content, pa...

使用说明 (SKILL.md)

PDF to Markdown - Extract Text, Tables, Formulas from PDF

Convert PDF files to clean Markdown using MinerU Open API. No API key required.

Quick Start

# Convert a local PDF to Markdown
mineru-open-api flash-extract report.pdf

# Convert a PDF from URL (no download needed)
mineru-open-api flash-extract https://cdn-mineru.openxlab.org.cn/demo/example.pdf

# Save to file
mineru-open-api flash-extract report.pdf -o ./output/

# Convert specific pages
mineru-open-api flash-extract report.pdf --pages 1-10

Language Rule

You MUST reply to the user in the SAME language they use. This is non-negotiable.

Capabilities

Extracts text, tables, and formulas from PDF
Supports both local files and URLs directly
Page range selection with --pages
Language hint with --language (default: ch, use en for English)
No API key, no signup, no authentication
Max 10MB / 20 pages per document

When to Use

User asks to "read", "extract", "convert", or "parse" a PDF
User shares a PDF file or PDF link and asks for its content
User wants to summarize or analyze a PDF document
User needs PDF content in Markdown format

CLI Reference

Run mineru-open-api flash-extract --help for all available options.

Data Flow

flash-extract sends the document to the MinerU API (mineru.net) for processing and returns Markdown. This is a stateless API call — no account, no persistent storage. MinerU is an open-source project by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Notes

Output is Markdown only; images/tables/formulas may be replaced with placeholders
For larger files (up to 200MB/600 pages) or precision extraction with full assets, use mineru-open-api extract (requires auth via mineru-open-api auth)
If the CLI cannot be installed via npm/uv/go, download it from https://mineru.net/ecosystem?tab=cli

安全使用建议

This skill appears to do what it claims (call the mineru-open-api CLI to convert PDFs to Markdown), but it uploads the PDFs to an external MinerU API without authentication. Before installing or using it: 1) Do not send sensitive or confidential PDFs unless you trust mineru.net and understand its retention/privacy policy. 2) Verify the mineru-open-api package source (npm/uv) or the GitHub repo referenced in the SKILL.md to ensure you install the official CLI and not a malicious package. 3) If you need offline/local processing for privacy, prefer local extraction tools instead. 4) Test with non-sensitive sample documents first, and inspect the installed binary (or its source) if you require higher assurance.

功能分析

Type: OpenClaw Skill Name: pdftomd Version: 1.0.4 The skill is a wrapper for the MinerU Open API (mineru.net) used to convert PDF files to Markdown. It transparently discloses that documents are sent to an external API for processing and provides standard installation methods via npm, uv, or go. No malicious patterns, hidden data exfiltration, or harmful prompt injections were found in SKILL.md or the associated metadata.

能力评估

✓ Purpose & Capability

The skill is a PDF→Markdown converter and declares/uses a single CLI binary (mineru-open-api) and CLI commands that match that purpose. The install options (npm/uv/go) and referenced repo align with the MinerU project named in the README.

ℹ Instruction Scope

SKILL.md's runtime instructions are narrow and restricted to invoking mineru-open-api on local files or URLs. However, the instructions explicitly send documents to an external MinerU API (mineru.net). That is coherent with the described capability but has privacy implications: any PDF you convert is uploaded to a remote service.

ℹ Install Mechanism

Installation is via standard package ecosystems (npm, uv, go install) which is reasonable for a CLI. This is moderate-risk compared to an arbitrary download because packages come from registries and a GitHub path is provided; you should still verify the package source, version, and code before installing.

✓ Credentials

The skill requests no environment variables, credentials, or config paths. That is proportionate to the stated functionality. The lack of auth is consistent with the claim that small files require no API key, but means uploads are unauthenticated.

✓ Persistence & Privilege

The skill does not request persistent/always-on privileges, does not modify other skills, and has no special system path requirements. It installs a single CLI binary into the environment, which is expected behavior.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install pdftomd
安装完成后，直接呼叫该 Skill 的名称或使用 /pdftomd 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.4

- Added "uv" as a new install method for the CLI. - Updated install instructions to mention downloading from the official website if package managers are unavailable. - Removed the dedicated "Install" section to streamline documentation. - No changes to functionality or usage.

v1.0.3

- Added npm as a new installation option for mineru-open-api. - Updated Homebrew install method to npm and removed platform specificity. - Minor adjustment to install instructions for broader compatibility. - No changes to core functionality or usage.

v1.0.2

- Added installation instructions for the Go toolchain (go install) to the metadata. - Users can now install mineru-open-api via go install in addition to Homebrew.

v1.0.1

- Homebrew is now the primary (and only) install method listed; curl and PowerShell install instructions have been removed. - Installation information updated: references now point to the GitHub source and clarify open-source license (Apache-2.0). - Data privacy section replaced with a clearer Data Flow section describing how documents are processed. - Minor wording improvements for clarity and consistency throughout the documentation. - No changes to core functionality or usage.

v1.0.0

- Initial release of PDF to Markdown converter. - Extracts text, tables, and formulas from PDF files to clean Markdown. - Supports both local PDF files and direct URLs. - No authentication or API key required; open-source CLI. - Allows page range selection and language hints. - Maximum file size of 10MB or 20 pages per document.

元数据

Slug pdftomd

版本 1.0.4

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 5

常见问题