← 返回 Skills 市场
mzlzyca

Doc Extract

作者 mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ 安全检测通过
188
总下载
0
收藏
0
当前安装
6
版本数
在 OpenClaw 中安装
/install doc-extract
功能描述
Extract text and content from Word documents (.doc, .docx) to Markdown using MinerU. A straightforward tool for reading and extracting Word file content. Fea...
使用说明 (SKILL.md)

Doc Extract

Extract text and content from Word (.doc/.docx) files to Markdown using MinerU.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Quick extraction from .docx (no token required)
mineru-open-api flash-extract report.docx

# Save to directory
mineru-open-api flash-extract report.docx -o ./out/

# Extract .doc file (requires token)
mineru-open-api extract report.doc -o ./out/

# Extract with language hint
mineru-open-api extract report.docx --language en -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc and extract:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, max 10 MB / 20 pages) and extract
  • .doc: requires extract with token
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (e.g. 1-10)

Notes

  • .doc requires extract with token; .docx works with flash-extract for quick extraction
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
安全使用建议
This skill appears to do what it claims: it invokes the MinerU CLI to extract Word content. Before installing, verify the mineru-open-api npm/go package and the homepage (https://mineru.net) are legitimate and up-to-date. Provide MINERU_TOKEN only if you need full .doc extraction; avoid using a high-privilege or shared token. Remember the CLI will read local files you point it at—do not process sensitive documents unless you trust the installed package and the MinerU service.
功能分析
Type: OpenClaw Skill Name: doc-extract Version: 0.4.0 The skill bundle is a documentation-only wrapper for the 'mineru-open-api' tool, used for extracting text from Word documents (.doc/.docx). It contains no executable code within the bundle itself, and the instructions in SKILL.md are strictly aligned with the stated purpose of document processing via the MinerU service (mineru.net). There are no signs of prompt injection, data exfiltration, or malicious intent.
能力评估
Purpose & Capability
Name/description match the declared requirements: the skill needs the mineru-open-api CLI and an optional MINERU_TOKEN for full extraction of .doc files, which is coherent with a document-extraction utility.
Instruction Scope
SKILL.md instructs the agent to invoke mineru-open-api commands on local files or URLs and to set MINERU_TOKEN for authenticated operations; it does not request unrelated files, credentials, or system access.
Install Mechanism
Install options are standard package installs (npm or go install) for a named package that produces the expected binary; no arbitrary URL downloads or extract steps are present.
Credentials
Only MINERU_TOKEN is required and is justified by the README: flash-extract on .docx is tokenless while full .doc extraction requires authentication. No unrelated secrets or multiple credentials are requested.
Persistence & Privilege
Skill does not request always:true, does not modify other skills, and has normal autonomous-invocation defaults. It does not request elevated or persistent system privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install doc-extract
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /doc-extract 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.
v1.1.0
Update to v1.1.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Doc Extract - extract text and content from Word (.doc/.docx) documents to Markdown using MinerU. Us
元数据
Slug doc-extract
版本 0.4.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 6
常见问题

Doc Extract 是什么?

Extract text and content from Word documents (.doc, .docx) to Markdown using MinerU. A straightforward tool for reading and extracting Word file content. Fea... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 188 次。

如何安装 Doc Extract?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-extract」即可一键安装,无需额外配置。

Doc Extract 是免费的吗?

是的,Doc Extract 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Doc Extract 支持哪些平台?

Doc Extract 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Doc Extract?

由 mzlzyCA(@mzlzyca)开发并维护,当前版本 v0.4.0。

💬 留言讨论