← Back to Skills Marketplace
mineru document extractor
by
MinerU-Extract
· GitHub ↗
· v0.1.29
· MIT-0
3200
Downloads
6
Stars
6
Active Installs
38
Versions
Install in OpenClaw
/install mineru-document-extractor
Description
MinerU document extraction — convert PDFs, scanned documents, images, Word (DOC/DOCX), PowerPoint (PPT/PPTX), and web pages into clean Markdown, HTML, LaTeX,...
Usage Guidance
This skill is coherent: it simply instructs the agent to use the mineru-open-api CLI which sends documents to mineru.net for server-side extraction. Before installing, consider: 1) Privacy — any document you submit (including sensitive content) will be uploaded to mineru.net; confirm the service's privacy policy and whether that is acceptable. 2) Token storage — if you use MINERU_TOKEN, it may be stored in ~/.mineru/config.yaml or as an env var; protect that token. 3) Package provenance — verify the mineru-open-api npm/GitHub project is the legitimate upstream (check repository, maintainers, and package integrity) before installing in production. 4) Network fetching — the CLI can crawl arbitrary URLs; if you allow autonomous agent runs, restrict network access or avoid running crawl on untrusted inputs to reduce SSRF or accidental exfiltration to internal endpoints. 5) Sandbox installation — if you have doubts, test the CLI in an isolated environment (container or VM) and inspect what (if any) files/configs are created. If you want a local-only option (no external upload), look for tools that run extraction locally rather than calling a remote API.
Capability Analysis
Type: OpenClaw Skill
Name: mineru-document-extractor
Version: 0.1.29
The skill is a legitimate wrapper for the MinerU document extraction CLI (mineru-open-api), used for converting PDFs and other documents into Markdown or HTML. It explicitly discloses in its metadata that data is transmitted to the mineru.net API for processing. The skill correctly restricts the agent's capabilities using the 'allowed-tools' field to only the specific 'mineru-open-api' binary, and the instructions in SKILL.md are well-aligned with the tool's stated purpose without any evidence of malicious intent or prompt injection.
Capability Assessment
Purpose & Capability
Name/description match the runtime instructions and declared dependencies: the skill is an instruction-only wrapper around the mineru-open-api CLI. The declared required binary (mineru-open-api) and optional MINERU_TOKEN env/config are appropriate for a cloud-based document extraction tool.
Instruction Scope
SKILL.md only instructs the agent to run the mineru-open-api CLI (flash-extract, extract, crawl, auth). It references the token, optional config (~/.mineru/config.yaml), and remote API host (mineru.net). This stays within the stated purpose. Note: the crawl command fetches arbitrary HTTP/HTTPS URLs (so a running agent could request internal or external endpoints if invoked), and all document data is sent to mineru.net for server-side processing per the metadata.
Install Mechanism
Install options are standard package installs (npm package mineru-open-api and a Go install from github.com/opendatalab). No downloads from random shorteners or personal IP addresses are specified. As with any third-party npm/go package, verify the package source/repo and integrity before installing into sensitive environments.
Credentials
The skill does not require unrelated credentials. MINERU_TOKEN is optional and justified for higher-capability 'extract' and 'crawl' modes; config path (~/.mineru/config.yaml) is consistent with the CLI's auth behavior. No unrelated secrets or system credentials are requested.
Persistence & Privilege
always is false and the skill does not request elevated or persistent platform privileges. It does not modify other skills or agent-wide settings. Note: autonomous agent invocation (default) combined with the ability to crawl URLs means an agent could be used to fetch arbitrary endpoints if allowed to run without restrictions.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install mineru-document-extractor - After installation, invoke the skill by name or use
/mineru-document-extractor - Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.29
No user-visible changes in this release — SKILL.md and included documentation remain unchanged.
v0.1.28
- Updated the skill metadata to remove mention of the mineru-open-api source code reference and clarify package installation steps.
- Adjusted installation info and metadata to align with current packaging and supported platforms.
- Removed references to the mineru-open-api CLI as the official open-source client in the privacy notice.
- No changes to the tool's usage, commands, or core functionality.
v0.1.27
mineru-document-extractor 0.1.27
- Added metadata section to SKILL.md to improve discoverability and clarify installation, privacy, and usage details.
- No changes to command syntax, features, or workflow.
- No code or functional changes detected.
v0.1.26
- MinerU flash-extract now features table recognition, formula recognition, and OCR, expanding its quick extraction capabilities.
- Updated comparison: both extraction modes (flash-extract and extract) now support tables, formulas, and OCR.
- Clarified that advanced features like VLM model selection, multi-format output, and batch processing remain exclusive to the precision extract mode.
- Adjusted usage guidelines and workflow examples to reflect the enhanced abilities of flash-extract for instant document conversion.
v0.1.25
- Documentation was significantly rewritten for clarity and conciseness.
- All usage instructions and workflows are now consistently branded as "MinerU".
- Reorganized sections and simplified tables for easier reading.
- Command/flag lists are streamlined; repetitive/advanced batch details were removed or condensed.
- More direct agent usage rules and examples are provided.
- Technical content and command references remain unchanged.
v0.1.24
- Skill name updated from "mineru" to "MinerU Document Extractor"
- Title and headings clarified for consistency and branding
- No changes to functionality or commands
- No code or logic modifications detected
v0.1.23
Version 0.1.23 of mineru-document-extractor
- No functional or documentation changes detected in this release.
- Version update with no file modifications.
v0.1.22
No functional changes; minor metadata and description update.
- Updated the skill description and metadata for clarity and completeness.
- No code or command changes.
v0.1.21
- Added tags section to metadata for improved discoverability and categorization.
- No functional changes to document extraction features or CLI usage.
- Documentation updated only; no code or behavior changes included.
v0.1.20
No functional changes detected in version 0.1.20.
- No code or documentation changes in this release.
- Skill content and capabilities remain unchanged.
v0.1.19
No changes detected in this version.
- Version 0.1.19 was released with no detected file or documentation updates.
v0.1.18
- Updated documentation headline from "Document Extraction with mineru-open-api" to "Document Extraction with mineru agent api"
- No code or functionality changes detected in this version
- All installation, usage, and feature notes remain unchanged
v0.1.17
- Added _meta.json file for metadata management.
- No changes to core functionality or documentation content.
v0.1.16
- Removed the _meta.json file from the skill package.
- No changes to functionality or user-facing documentation.
v0.1.15
No file changes detected in this release.
- No updates or changes; internal version bump only.
- Functionality and documentation remain the same as the previous version.
v0.1.14
No changes detected in this version.
- Version bumped to 0.1.14 with no file or documentation changes.
- No new features, fixes, or updates included in this release.
v0.1.13
- No file or documentation changes detected in this release.
- Version bump only; functionality remains unchanged.
v0.1.12
- Removed the file CONTRIBUTING.md from the project.
- No functional or user-facing changes.
v0.1.11
- Added CONTRIBUTING.md to provide contribution guidelines.
- Added _meta.json for additional skill metadata configuration.
v0.1.10
Fix: move config.yaml from requires to optional (flash-extract works without it)
Metadata
Frequently Asked Questions
What is mineru document extractor?
MinerU document extraction — convert PDFs, scanned documents, images, Word (DOC/DOCX), PowerPoint (PPT/PPTX), and web pages into clean Markdown, HTML, LaTeX,... It is an AI Agent Skill for Claude Code / OpenClaw, with 3200 downloads so far.
How do I install mineru document extractor?
Run "/install mineru-document-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is mineru document extractor free?
Yes, mineru document extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does mineru document extractor support?
mineru document extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created mineru document extractor?
It is built and maintained by MinerU-Extract (@mineru-extract); the current version is v0.1.29.
More Skills