← 返回 Skills 市场
mzlzyca

Doc Parse

作者 mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ⚠ suspicious
493
总下载
0
收藏
2
当前安装
6
版本数
在 OpenClaw 中安装
/install doc-parse
功能描述
Parse and extract structured content from Word documents (.doc, .docx) into well-organized Markdown using MinerU. Preserves the full document hierarchy: head...
使用说明 (SKILL.md)

Doc Parse

Parse Word (.doc/.docx) documents into structured Markdown using MinerU. Preserves document hierarchy including headings, lists, tables, and paragraphs.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Quick parse from .docx (no token required)
mineru-open-api flash-extract report.docx

# Save structured Markdown to directory
mineru-open-api flash-extract report.docx -o ./out/

# Parse .doc file (requires token)
mineru-open-api extract report.doc -o ./out/

# With language hint
mineru-open-api extract report.docx --language en -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, max 10 MB / 20 pages) and extract
  • .doc: requires extract with token
  • Output preserves document structure as Markdown hierarchy
  • Language hint with --language (default: ch, use en for English)

Notes

  • .doc requires extract with token; .docx supports flash-extract for quick parsing
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
安全使用建议
This skill appears to do what it says (wrap the MinerU CLI to parse .doc/.docx), but there are two things to check before installing: (1) credential discrepancy — the registry metadata marks MINERU_TOKEN as required, yet SKILL.md says flash-extract for .docx works without a token; verify whether you can use quick parsing without providing a token. (2) data handling/privacy — the CLI likely communicates with MinerU's service (mineru.net) for at least some operations; confirm whether document content is uploaded, how long it's retained, and whether that is acceptable for your documents. Also confirm the npm package and GitHub repo are the official MinerU releases (inspect the mineru-open-api source) and only provide a token with minimal necessary scope. If you need higher assurance, run the CLI in a sandbox or review the open-source repo and network calls before sending sensitive documents.
功能分析
Type: OpenClaw Skill Name: doc-parse Version: 0.4.0 The doc-parse skill is a legitimate utility for converting Word documents (.doc, .docx) into Markdown using the MinerU document intelligence engine. It utilizes the 'mineru-open-api' CLI tool and requires an API token for full extraction features, which is standard for this type of service. The installation sources (npm and GitHub) and the homepage (mineru.net) are consistent with the stated purpose, and no malicious patterns, data exfiltration, or prompt injection attempts were identified in SKILL.md or _meta.json.
能力评估
Purpose & Capability
Name and description match the declared requirements: the skill uses the mineru-open-api CLI to parse .doc/.docx into Markdown. Requesting the mineru-open-api binary and a MINERU_TOKEN is consistent with a cloud-backed CLI. However, SKILL.md explicitly says 'flash-extract' for .docx requires no token while the registry metadata lists MINERU_TOKEN as a required env var/primary credential — this is an internal inconsistency.
Instruction Scope
SKILL.md instructs only running the mineru-open-api CLI on local files or URLs and how to set MINERU_TOKEN. It does not direct reading unrelated system files, scanning shell history, or exfiltrating data to unexpected endpoints. The instructions do not clarify whether processing happens locally or via MinerU's API; that ambiguity is important for privacy but not a scope creep in itself.
Install Mechanism
Install options are standard: an npm package 'mineru-open-api' and a Go 'github.com/opendatalab/...' install path (GitHub). These are expected for a CLI authored by the MinerU project. No arbitrary binary downloads or obscure URLs are present in the install spec.
Credentials
Requesting MINERU_TOKEN as the primary credential is reasonable for an API-backed CLI and required for some operations (.doc extract). But marking MINERU_TOKEN as globally required conflicts with SKILL.md which says quick .docx 'flash-extract' requires no token. Also the skill does not declare additional unrelated credentials, which is good.
Persistence & Privilege
always:false (default) and autonomous invocation allowed (platform default). The skill does not request persistent system-wide privileges or modification of other skills/configuration in the provided instructions.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install doc-parse
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /doc-parse 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.
v1.1.0
Update to v1.1.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Doc Parse - parse Word (.doc/.docx) documents into structured Markdown using MinerU. Preserves docum
元数据
Slug doc-parse
版本 0.4.0
许可证 MIT-0
累计安装 2
当前安装数 2
历史版本数 6
常见问题

Doc Parse 是什么?

Parse and extract structured content from Word documents (.doc, .docx) into well-organized Markdown using MinerU. Preserves the full document hierarchy: head... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 493 次。

如何安装 Doc Parse?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-parse」即可一键安装,无需额外配置。

Doc Parse 是免费的吗?

是的,Doc Parse 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Doc Parse 支持哪些平台?

Doc Parse 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Doc Parse?

由 mzlzyCA(@mzlzyca)开发并维护,当前版本 v0.4.0。

💬 留言讨论