← Back to Skills Marketplace
a-i-r

MinerU PDF Extractor

by A-I-R · GitHub ↗ · v1.0.5
cross-platform ✓ Security Clean
940
Downloads
2
Stars
2
Active Installs
6
Versions
Install in OpenClaw
/install mineru-pdf-extractor
Description
Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.
Usage Guidance
This skill appears to be what it claims: a set of scripts to call the MinerU API to parse PDFs. Before installing/running: 1) Be sure to set MINERU_TOKEN (or MINERU_API_KEY) — SKILL.md requires it even though the top-level registry metadata omitted it. 2) Review the shell scripts (they are included) and only run them if you trust the source and the MinerU endpoints listed (mineru.net, mineru.oss-cn-shanghai.aliyuncs.com, cdn-mineru.openxlab.org.cn). 3) The scripts use curl and unzip (and may use jq or python3 if present); install those if you want improved JSON handling. 4) Treat your MINERU token as sensitive — do not expose it in public repos or logs, and consider least-privilege options with the provider. 5) If you will process sensitive PDFs, verify the provider's privacy policy before uploading. Overall: coherent and low-risk for its stated purpose, with only the metadata-accuracy and tooling-notes mentioned above to fix.
Capability Analysis
Type: OpenClaw Skill Name: mineru-pdf-extractor Version: 1.0.5 The skill bundle is classified as benign. It demonstrates strong security practices, including explicit input sanitization (`validate_filename`, `validate_dirname`, `escape_json`), URL validation (whitelisting `cdn-mineru.openxlab.org.cn` for downloads in `scripts/local_file_step4_download.sh` and `scripts/online_file_step2_poll_result.sh`), and ZIP file integrity checks (`unzip -t`). The documentation (`SKILL.md`, `docs/*.md`) clearly outlines these security measures and their purpose, indicating a deliberate effort to prevent common vulnerabilities like directory traversal and injection attacks. No evidence of data exfiltration, malicious execution, persistence mechanisms, or prompt injection attempts against the AI agent was found.
Capability Assessment
Purpose & Capability
The skill's name/description (PDF → Markdown using MinerU) matches the included scripts and docs: they call MinerU API endpoints, upload files to presigned OSS URLs, poll results and download a ZIP with parsed Markdown. One inconsistency: the registry metadata at the top states "Required env vars: none", but the SKILL.md and all scripts clearly require an API token (MINERU_TOKEN or MINERU_API_KEY). This is likely an authoring/metadata omission rather than malicious behavior, but users should be aware the token is required.
Instruction Scope
The runtime instructions and scripts operate within stated scope: reading a local PDF path (when using local flow), validating/sanitizing inputs, calling MinerU API endpoints under MINERU_BASE_URL, uploading to presigned OSS URLs and downloading results from the official CDN host. Scripts include input sanitization, ZIP validation and directory traversal checks. They do not attempt to read unrelated system files or send data to unexpected external endpoints. Minor tooling note: scripts optionally pipe responses to `python3 -m json.tool` for pretty-printing but SKILL.md does not list python3 as a recommended/required tool.
Install Mechanism
There is no install spec; this is an instruction-only skill with included shell scripts. Nothing in the bundle downloads arbitrary code at install time. Risk is low from the install mechanism itself. However, running the provided scripts will execute code included in the repo, so users should review them before executing.
Credentials
The scripts require a single service credential (MINERU_TOKEN or MINERU_API_KEY) and optionally MINERU_BASE_URL. That is proportional for a MinerU API integration. The only notable mismatch is registry metadata claiming no required env vars while SKILL.md and scripts require the token—this should be corrected. No unrelated secrets or broad cloud credentials (AWS, GCP, etc.) are requested.
Persistence & Privilege
The skill does not request permanent/always-on privileges, does not alter other skills or system-wide configs, and is user-invocable only. Default autonomous invocation is allowed (platform normal) but the skill itself does not request elevated persistence.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install mineru-pdf-extractor
  3. After installation, invoke the skill by name or use /mineru-pdf-extractor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.5
No file changes were detected in this version. - Version metadata updated without modifications to any skill files. - No user-facing features, documentation, or code have changed.
v1.0.4
- Added Security Functions in Files in docs Folder.
v1.0.3
No functional changes in this release. - Documentation metadata in SKILL.md was updated for improved formatting and clarity. - Added author, version, requirements, and optional fields in YAML format. - Linked and described both English and Chinese documentation files. - No changes to code or features.
v1.0.2
- Added Chinese documentation files: SKILL_zh.md and two detailed workflow guides under docs/ for both online and local document parsing. - Updated requirements to mention optional jq dependency for enhanced JSON parsing and security. - No changes to main logic or features; update focuses on documentation and usability for Chinese-speaking users.
v1.0.1
- Added homepage and source repository links to metadata for easier access to the MinerU website and GitHub source. - Clarified that this is a community skill and not an official MinerU product. - No behavioral changes or new features; documentation improvements only.
v1.0.0
Initial release of mineru-pdf-extractor. - Extract PDF content to Markdown using MinerU API, with support for formulas, tables, and OCR. - Provides scripts and documentation for both local file and online URL parsing methods. - Local parsing has a 4-step process; online parsing is a 2-step process. - Output includes Markdown files, extracted images, and structured JSON data. - Requires curl, unzip, and a MinerU API token set as an environment variable. - Detailed usage guides and batch processing examples included.
Metadata
Slug mineru-pdf-extractor
Version 1.0.5
License
All-time Installs 2
Active Installs 2
Total Versions 6
Frequently Asked Questions

What is MinerU PDF Extractor?

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods. It is an AI Agent Skill for Claude Code / OpenClaw, with 940 downloads so far.

How do I install MinerU PDF Extractor?

Run "/install mineru-pdf-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is MinerU PDF Extractor free?

Yes, MinerU PDF Extractor is completely free (open-source). You can download, install and use it at no cost.

Which platforms does MinerU PDF Extractor support?

MinerU PDF Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created MinerU PDF Extractor?

It is built and maintained by A-I-R (@a-i-r); the current version is v1.0.5.

💬 Comments