← Back to Skills Marketplace
mzlzyca

Doc Parse

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ⚠ suspicious
493
Downloads
0
Stars
2
Active Installs
6
Versions
Install in OpenClaw
/install doc-parse
Description
Parse and extract structured content from Word documents (.doc, .docx) into well-organized Markdown using MinerU. Preserves the full document hierarchy: head...
README (SKILL.md)

Doc Parse

Parse Word (.doc/.docx) documents into structured Markdown using MinerU. Preserves document hierarchy including headings, lists, tables, and paragraphs.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Quick parse from .docx (no token required)
mineru-open-api flash-extract report.docx

# Save structured Markdown to directory
mineru-open-api flash-extract report.docx -o ./out/

# Parse .doc file (requires token)
mineru-open-api extract report.doc -o ./out/

# With language hint
mineru-open-api extract report.docx --language en -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, max 10 MB / 20 pages) and extract
  • .doc: requires extract with token
  • Output preserves document structure as Markdown hierarchy
  • Language hint with --language (default: ch, use en for English)

Notes

  • .doc requires extract with token; .docx supports flash-extract for quick parsing
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill appears to do what it says (wrap the MinerU CLI to parse .doc/.docx), but there are two things to check before installing: (1) credential discrepancy — the registry metadata marks MINERU_TOKEN as required, yet SKILL.md says flash-extract for .docx works without a token; verify whether you can use quick parsing without providing a token. (2) data handling/privacy — the CLI likely communicates with MinerU's service (mineru.net) for at least some operations; confirm whether document content is uploaded, how long it's retained, and whether that is acceptable for your documents. Also confirm the npm package and GitHub repo are the official MinerU releases (inspect the mineru-open-api source) and only provide a token with minimal necessary scope. If you need higher assurance, run the CLI in a sandbox or review the open-source repo and network calls before sending sensitive documents.
Capability Analysis
Type: OpenClaw Skill Name: doc-parse Version: 0.4.0 The doc-parse skill is a legitimate utility for converting Word documents (.doc, .docx) into Markdown using the MinerU document intelligence engine. It utilizes the 'mineru-open-api' CLI tool and requires an API token for full extraction features, which is standard for this type of service. The installation sources (npm and GitHub) and the homepage (mineru.net) are consistent with the stated purpose, and no malicious patterns, data exfiltration, or prompt injection attempts were identified in SKILL.md or _meta.json.
Capability Assessment
Purpose & Capability
Name and description match the declared requirements: the skill uses the mineru-open-api CLI to parse .doc/.docx into Markdown. Requesting the mineru-open-api binary and a MINERU_TOKEN is consistent with a cloud-backed CLI. However, SKILL.md explicitly says 'flash-extract' for .docx requires no token while the registry metadata lists MINERU_TOKEN as a required env var/primary credential — this is an internal inconsistency.
Instruction Scope
SKILL.md instructs only running the mineru-open-api CLI on local files or URLs and how to set MINERU_TOKEN. It does not direct reading unrelated system files, scanning shell history, or exfiltrating data to unexpected endpoints. The instructions do not clarify whether processing happens locally or via MinerU's API; that ambiguity is important for privacy but not a scope creep in itself.
Install Mechanism
Install options are standard: an npm package 'mineru-open-api' and a Go 'github.com/opendatalab/...' install path (GitHub). These are expected for a CLI authored by the MinerU project. No arbitrary binary downloads or obscure URLs are present in the install spec.
Credentials
Requesting MINERU_TOKEN as the primary credential is reasonable for an API-backed CLI and required for some operations (.doc extract). But marking MINERU_TOKEN as globally required conflicts with SKILL.md which says quick .docx 'flash-extract' requires no token. Also the skill does not declare additional unrelated credentials, which is good.
Persistence & Privilege
always:false (default) and autonomous invocation allowed (platform default). The skill does not request persistent system-wide privileges or modification of other skills/configuration in the provided instructions.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install doc-parse
  3. After installation, invoke the skill by name or use /doc-parse
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.
v1.1.0
Update to v1.1.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Doc Parse - parse Word (.doc/.docx) documents into structured Markdown using MinerU. Preserves docum
Metadata
Slug doc-parse
Version 0.4.0
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 6
Frequently Asked Questions

What is Doc Parse?

Parse and extract structured content from Word documents (.doc, .docx) into well-organized Markdown using MinerU. Preserves the full document hierarchy: head... It is an AI Agent Skill for Claude Code / OpenClaw, with 493 downloads so far.

How do I install Doc Parse?

Run "/install doc-parse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Doc Parse free?

Yes, Doc Parse is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Doc Parse support?

Doc Parse is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Doc Parse?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments