← Back to Skills Marketplace
mzlzyca

Doc OCR

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ Security Clean
201
Downloads
0
Stars
0
Active Installs
6
Versions
Install in OpenClaw
/install doc-ocr
Description
OCR (Optical Character Recognition) for Word documents (.docx) containing scanned pages or image-embedded content. Uses MinerU to extract text from Word file...
README (SKILL.md)

Doc OCR

Use OCR to extract text from Word (.docx) files that contain scanned pages or image-embedded content, using MinerU.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# OCR extraction from .docx (requires token)
mineru-open-api extract report.docx --ocr -o ./out/

# With VLM model for better accuracy on complex image layouts
mineru-open-api extract report.docx --ocr --model vlm -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .docx (local file or URL)
  • OCR is only available via extract (requires token)
  • Use --ocr flag to enable OCR on image-embedded content
  • Use --model vlm for complex or mixed-content documents
  • Language hint with --language (default: ch, use en for English)

Notes

  • OCR is NOT available in flash-extract — use extract with --ocr
  • If the .docx has a normal text layer, OCR is not needed — use doc-extract instead
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill appears to do what it says: it runs the MinerU CLI to OCR .docx files and requires a MinerU API token. Before installing: (1) confirm you trust the npm package or GitHub repo (inspect source if you need high assurance); (2) treat MINERU_TOKEN like a secret—use a token with minimal scope and do not store it in shared places; (3) assume documents processed may be uploaded to MinerU's servers—do not OCR highly sensitive documents unless you verify local-only processing or run your own MinerU instance; (4) prefer installing from official project releases or from source if you want to audit behavior (npm installs can run scripts).
Capability Analysis
Type: OpenClaw Skill Name: doc-ocr Version: 0.4.0 The skill provides instructions for using the MinerU OCR service to extract text from Word documents via the 'mineru-open-api' CLI tool. It requires a legitimate API token and points to official resources from OpenDataLab (Shanghai AI Lab). No malicious code, obfuscation, or prompt injection attempts were found; the behavior is entirely consistent with the stated purpose of document OCR.
Capability Assessment
Purpose & Capability
Name/description (OCR for .docx using MinerU) matches the declared requirements: a mineru-open-api binary and a MINERU_TOKEN. The install options (npm or go install for mineru-open-api) are the expected way to obtain that CLI.
Instruction Scope
SKILL.md only instructs running mineru-open-api on local files or URLs and configuring MINERU_TOKEN. It does not ask the agent to read unrelated files or environment variables. Important caveat: the docs and auth flow imply processing via MinerU's service (token management and API token creation), so document contents may be uploaded to an external service—review privacy requirements before OCRing sensitive documents.
Install Mechanism
Install spec uses npm (mineru-open-api) or go install from a GitHub path — both are reasonable for a CLI. Note that global npm installs run package scripts and that npm packages come from the public registry; if you need higher assurance, inspect the package source or install from the project repo directly.
Credentials
Only MINERU_TOKEN is required and set as the primary credential, which is proportionate for a remote OCR API. Keep the token secret and limit its scope if possible.
Persistence & Privilege
Skill is not always-enabled and does not request system config paths or other skills' credentials. It is user-invocable and can be autonomously called by the agent (normal behavior) but does not request elevated persistence.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install doc-ocr
  3. After installation, invoke the skill by name or use /doc-ocr
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.
v1.1.0
Update to v1.1.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
Doc OCR - use OCR to extract text from Word (.docx) files with scanned or image-embedded content usi
Metadata
Slug doc-ocr
Version 0.4.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 6
Frequently Asked Questions

What is Doc OCR?

OCR (Optical Character Recognition) for Word documents (.docx) containing scanned pages or image-embedded content. Uses MinerU to extract text from Word file... It is an AI Agent Skill for Claude Code / OpenClaw, with 201 downloads so far.

How do I install Doc OCR?

Run "/install doc-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Doc OCR free?

Yes, Doc OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Doc OCR support?

Doc OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Doc OCR?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments