← Back to Skills Marketplace
mzlzyca

Doc To Text

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ Security Clean
186
Downloads
0
Stars
0
Active Installs
6
Versions
Install in OpenClaw
/install doc-to-text
Description
Extract plain readable text from Word documents (.doc, .docx) using MinerU. Outputs Markdown (the closest plain-text format supported) for easy reading and p...
README (SKILL.md)

Doc To Text

Extract plain readable text from Word (.doc/.docx) documents using MinerU. MinerU outputs Markdown, which is the closest format to plain text it supports.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract text from .docx to stdout (no token required)
mineru-open-api flash-extract report.docx

# Save to file
mineru-open-api flash-extract report.docx -o ./out/

# Extract .doc (requires token)
mineru-open-api extract report.doc -o ./out/

# JSON output contains plain text fields (requires token)
mineru-open-api extract report.docx -f json -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc and extract:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, Markdown output to stdout)
  • .doc: requires extract with token
  • For truly plain text: use extract -f json and read the text fields from the JSON output
  • Language hint with --language (default: ch, use en for English)

Notes

  • MinerU does not have a -f text option; Markdown is the closest to plain text
  • .doc requires extract with token; .docx works with flash-extract
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill appears to do exactly what it claims: call the mineru-open-api CLI to extract text from .doc/.docx files. Before installing, decide whether you trust the MinerU project and the npm/GitHub sources used to install the CLI. MINERU_TOKEN grants the MinerU service permission to process documents — avoid putting a high-privilege secret there, and create/restrict a token with minimal scope if possible. If you are cautious, inspect the npm package or GitHub repo (github.com/opendatalab/MinerU-Ecosystem) prior to installing, run the CLI in a sandbox or container, and revoke the token if you stop using the skill.
Capability Analysis
Type: OpenClaw Skill Name: doc-to-text Version: 0.4.0 The doc-to-text skill bundle provides instructions for an AI agent to use the legitimate 'mineru-open-api' CLI tool (developed by OpenDataLab/Shanghai AI Lab) for document processing. The SKILL.md and _meta.json files contain standard installation steps via npm or Go and usage examples for extracting text from Word documents, with no evidence of malicious intent, data exfiltration, or prompt injection.
Capability Assessment
Purpose & Capability
The name/description (Word -> plain text via MinerU) matches the required binary (mineru-open-api) and the single required environment variable (MINERU_TOKEN). The MINERU_TOKEN is justified for the documented 'extract' operations; no unrelated credentials or binaries are requested.
Instruction Scope
SKILL.md only instructs running the mineru-open-api CLI (flash-extract/extract), setting MINERU_TOKEN or using interactive auth, and points to mineru.net. It does not ask the agent to read unrelated files, other env vars, or transmit data to unexpected endpoints.
Install Mechanism
Install uses standard package registries: npm package 'mineru-open-api' or 'go install' from github.com/opendatalab/... — both are expected for distributing a CLI. This is normal but requires trusting those package sources; no random downloads or archive extraction from untrusted URLs are present.
Credentials
Only MINERU_TOKEN is required and is declared as the primary credential. That aligns with the documented need for a token for 'extract'/.doc processing. No extra or unrelated secrets are requested.
Persistence & Privilege
The skill is not marked always:true, does not request system-wide config changes, and is instruction-only (no bundled code). Installing the CLI is standard behavior and there is no evidence the skill modifies other skills or global agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install doc-to-text
  3. After installation, invoke the skill by name or use /doc-to-text
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.
v1.1.0
Update to v1.1.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Doc to Text - extract plain readable text from Word (.doc/.docx) documents using MinerU. Output is M
Metadata
Slug doc-to-text
Version 0.4.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 6
Frequently Asked Questions

What is Doc To Text?

Extract plain readable text from Word documents (.doc, .docx) using MinerU. Outputs Markdown (the closest plain-text format supported) for easy reading and p... It is an AI Agent Skill for Claude Code / OpenClaw, with 186 downloads so far.

How do I install Doc To Text?

Run "/install doc-to-text" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Doc To Text free?

Yes, Doc To Text is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Doc To Text support?

Doc To Text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Doc To Text?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments