← Back to Skills Marketplace
mzlzyca

Doc Extract

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ Security Clean
188
Downloads
0
Stars
0
Active Installs
6
Versions
Install in OpenClaw
/install doc-extract
Description
Extract text and content from Word documents (.doc, .docx) to Markdown using MinerU. A straightforward tool for reading and extracting Word file content. Fea...
README (SKILL.md)

Doc Extract

Extract text and content from Word (.doc/.docx) files to Markdown using MinerU.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Quick extraction from .docx (no token required)
mineru-open-api flash-extract report.docx

# Save to directory
mineru-open-api flash-extract report.docx -o ./out/

# Extract .doc file (requires token)
mineru-open-api extract report.doc -o ./out/

# Extract with language hint
mineru-open-api extract report.docx --language en -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc and extract:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, max 10 MB / 20 pages) and extract
  • .doc: requires extract with token
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (e.g. 1-10)

Notes

  • .doc requires extract with token; .docx works with flash-extract for quick extraction
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill appears to do what it claims: it invokes the MinerU CLI to extract Word content. Before installing, verify the mineru-open-api npm/go package and the homepage (https://mineru.net) are legitimate and up-to-date. Provide MINERU_TOKEN only if you need full .doc extraction; avoid using a high-privilege or shared token. Remember the CLI will read local files you point it at—do not process sensitive documents unless you trust the installed package and the MinerU service.
Capability Analysis
Type: OpenClaw Skill Name: doc-extract Version: 0.4.0 The skill bundle is a documentation-only wrapper for the 'mineru-open-api' tool, used for extracting text from Word documents (.doc/.docx). It contains no executable code within the bundle itself, and the instructions in SKILL.md are strictly aligned with the stated purpose of document processing via the MinerU service (mineru.net). There are no signs of prompt injection, data exfiltration, or malicious intent.
Capability Assessment
Purpose & Capability
Name/description match the declared requirements: the skill needs the mineru-open-api CLI and an optional MINERU_TOKEN for full extraction of .doc files, which is coherent with a document-extraction utility.
Instruction Scope
SKILL.md instructs the agent to invoke mineru-open-api commands on local files or URLs and to set MINERU_TOKEN for authenticated operations; it does not request unrelated files, credentials, or system access.
Install Mechanism
Install options are standard package installs (npm or go install) for a named package that produces the expected binary; no arbitrary URL downloads or extract steps are present.
Credentials
Only MINERU_TOKEN is required and is justified by the README: flash-extract on .docx is tokenless while full .doc extraction requires authentication. No unrelated secrets or multiple credentials are requested.
Persistence & Privilege
Skill does not request always:true, does not modify other skills, and has normal autonomous-invocation defaults. It does not request elevated or persistent system privileges.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install doc-extract
  3. After installation, invoke the skill by name or use /doc-extract
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.
v1.1.0
Update to v1.1.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Doc Extract - extract text and content from Word (.doc/.docx) documents to Markdown using MinerU. Us
Metadata
Slug doc-extract
Version 0.4.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 6
Frequently Asked Questions

What is Doc Extract?

Extract text and content from Word documents (.doc, .docx) to Markdown using MinerU. A straightforward tool for reading and extracting Word file content. Fea... It is an AI Agent Skill for Claude Code / OpenClaw, with 188 downloads so far.

How do I install Doc Extract?

Run "/install doc-extract" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Doc Extract free?

Yes, Doc Extract is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Doc Extract support?

Doc Extract is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Doc Extract?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments