← 返回 Skills 市场
mzlzyca

Doc To Text

作者 mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ 安全检测通过
186
总下载
0
收藏
0
当前安装
6
版本数
在 OpenClaw 中安装
/install doc-to-text
功能描述
Extract plain readable text from Word documents (.doc, .docx) using MinerU. Outputs Markdown (the closest plain-text format supported) for easy reading and p...
使用说明 (SKILL.md)

Doc To Text

Extract plain readable text from Word (.doc/.docx) documents using MinerU. MinerU outputs Markdown, which is the closest format to plain text it supports.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract text from .docx to stdout (no token required)
mineru-open-api flash-extract report.docx

# Save to file
mineru-open-api flash-extract report.docx -o ./out/

# Extract .doc (requires token)
mineru-open-api extract report.doc -o ./out/

# JSON output contains plain text fields (requires token)
mineru-open-api extract report.docx -f json -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc and extract:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, Markdown output to stdout)
  • .doc: requires extract with token
  • For truly plain text: use extract -f json and read the text fields from the JSON output
  • Language hint with --language (default: ch, use en for English)

Notes

  • MinerU does not have a -f text option; Markdown is the closest to plain text
  • .doc requires extract with token; .docx works with flash-extract
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
安全使用建议
This skill appears to do exactly what it claims: call the mineru-open-api CLI to extract text from .doc/.docx files. Before installing, decide whether you trust the MinerU project and the npm/GitHub sources used to install the CLI. MINERU_TOKEN grants the MinerU service permission to process documents — avoid putting a high-privilege secret there, and create/restrict a token with minimal scope if possible. If you are cautious, inspect the npm package or GitHub repo (github.com/opendatalab/MinerU-Ecosystem) prior to installing, run the CLI in a sandbox or container, and revoke the token if you stop using the skill.
功能分析
Type: OpenClaw Skill Name: doc-to-text Version: 0.4.0 The doc-to-text skill bundle provides instructions for an AI agent to use the legitimate 'mineru-open-api' CLI tool (developed by OpenDataLab/Shanghai AI Lab) for document processing. The SKILL.md and _meta.json files contain standard installation steps via npm or Go and usage examples for extracting text from Word documents, with no evidence of malicious intent, data exfiltration, or prompt injection.
能力评估
Purpose & Capability
The name/description (Word -> plain text via MinerU) matches the required binary (mineru-open-api) and the single required environment variable (MINERU_TOKEN). The MINERU_TOKEN is justified for the documented 'extract' operations; no unrelated credentials or binaries are requested.
Instruction Scope
SKILL.md only instructs running the mineru-open-api CLI (flash-extract/extract), setting MINERU_TOKEN or using interactive auth, and points to mineru.net. It does not ask the agent to read unrelated files, other env vars, or transmit data to unexpected endpoints.
Install Mechanism
Install uses standard package registries: npm package 'mineru-open-api' or 'go install' from github.com/opendatalab/... — both are expected for distributing a CLI. This is normal but requires trusting those package sources; no random downloads or archive extraction from untrusted URLs are present.
Credentials
Only MINERU_TOKEN is required and is declared as the primary credential. That aligns with the documented need for a token for 'extract'/.doc processing. No extra or unrelated secrets are requested.
Persistence & Privilege
The skill is not marked always:true, does not request system-wide config changes, and is instruction-only (no bundled code). Installing the CLI is standard behavior and there is no evidence the skill modifies other skills or global agent settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install doc-to-text
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /doc-to-text 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.
v1.1.0
Update to v1.1.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Doc to Text - extract plain readable text from Word (.doc/.docx) documents using MinerU. Output is M
元数据
Slug doc-to-text
版本 0.4.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 6
常见问题

Doc To Text 是什么?

Extract plain readable text from Word documents (.doc, .docx) using MinerU. Outputs Markdown (the closest plain-text format supported) for easy reading and p... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 186 次。

如何安装 Doc To Text?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-to-text」即可一键安装,无需额外配置。

Doc To Text 是免费的吗?

是的,Doc To Text 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Doc To Text 支持哪些平台?

Doc To Text 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Doc To Text?

由 mzlzyCA(@mzlzyca)开发并维护,当前版本 v0.4.0。

💬 留言讨论