← Back to Skills Marketplace

Doc OCR

Name: Doc OCR
Author: mzlzyca

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0

cross-platform ✓ Security Clean

201

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install doc-ocr

Description

OCR (Optical Character Recognition) for Word documents (.docx) containing scanned pages or image-embedded content. Uses MinerU to extract text from Word file...

README (SKILL.md)

Doc OCR

Use OCR to extract text from Word (.docx) files that contain scanned pages or image-embedded content, using MinerU.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# OCR extraction from .docx (requires token)
mineru-open-api extract report.docx --ocr -o ./out/

# With VLM model for better accuracy on complex image layouts
mineru-open-api extract report.docx --ocr --model vlm -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

Supported input: .docx (local file or URL)
OCR is only available via extract (requires token)
Use --ocr flag to enable OCR on image-embedded content
Use --model vlm for complex or mixed-content documents
Language hint with --language (default: ch, use en for English)

Notes

OCR is NOT available in flash-extract — use extract with --ocr
If the .docx has a normal text layer, OCR is not needed — use doc-extract instead
Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
All progress/status messages go to stderr; document content goes to stdout
MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Usage Guidance

This skill appears to do what it says: it runs the MinerU CLI to OCR .docx files and requires a MinerU API token. Before installing: (1) confirm you trust the npm package or GitHub repo (inspect source if you need high assurance); (2) treat MINERU_TOKEN like a secret—use a token with minimal scope and do not store it in shared places; (3) assume documents processed may be uploaded to MinerU's servers—do not OCR highly sensitive documents unless you verify local-only processing or run your own MinerU instance; (4) prefer installing from official project releases or from source if you want to audit behavior (npm installs can run scripts).

Capability Analysis

Type: OpenClaw Skill Name: doc-ocr Version: 0.4.0 The skill provides instructions for using the MinerU OCR service to extract text from Word documents via the 'mineru-open-api' CLI tool. It requires a legitimate API token and points to official resources from OpenDataLab (Shanghai AI Lab). No malicious code, obfuscation, or prompt injection attempts were found; the behavior is entirely consistent with the stated purpose of document OCR.

Capability Assessment

✓ Purpose & Capability

Name/description (OCR for .docx using MinerU) matches the declared requirements: a mineru-open-api binary and a MINERU_TOKEN. The install options (npm or go install for mineru-open-api) are the expected way to obtain that CLI.

ℹ Instruction Scope

SKILL.md only instructs running mineru-open-api on local files or URLs and configuring MINERU_TOKEN. It does not ask the agent to read unrelated files or environment variables. Important caveat: the docs and auth flow imply processing via MinerU's service (token management and API token creation), so document contents may be uploaded to an external service—review privacy requirements before OCRing sensitive documents.

ℹ Install Mechanism

Install spec uses npm (mineru-open-api) or go install from a GitHub path — both are reasonable for a CLI. Note that global npm installs run package scripts and that npm packages come from the public registry; if you need higher assurance, inspect the package source or install from the project repo directly.

✓ Credentials

Only MINERU_TOKEN is required and set as the primary credential, which is proportionate for a remote OCR API. Keep the token secret and limit its scope if possible.

✓ Persistence & Privilege

Skill is not always-enabled and does not request system config paths or other skills' credentials. It is user-invocable and can be autonomously called by the agent (normal behavior) but does not request elevated persistence.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install doc-ocr
After installation, invoke the skill by name or use /doc-ocr
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.4.0

SEO: expand description for better ClawHub vector search discovery

v0.3.0

Rollback to original version

v0.2.0

SEO optimization: expanded description with rich keywords, trigger phrases, and bilingual content for better ClawHub vector search ranking.

v1.1.0

Update to v1.1.0

v1.0.1

Fix: declare MINERU_TOKEN credential in metadata

v1.0.0

Doc OCR - use OCR to extract text from Word (.docx) files with scanned or image-embedded content usi

Metadata

Slug doc-ocr

Version 0.4.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 6

Frequently Asked Questions

What is Doc OCR?

OCR (Optical Character Recognition) for Word documents (.docx) containing scanned pages or image-embedded content. Uses MinerU to extract text from Word file... It is an AI Agent Skill for Claude Code / OpenClaw, with 201 downloads so far.

How do I install Doc OCR?

Run "/install doc-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Doc OCR free?

Yes, Doc OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Doc OCR support?

Doc OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Doc OCR?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

More Skills

Doc OCR

Doc OCR

Install

Quick Start

Authentication

Capabilities

Notes

What is Doc OCR?

How do I install Doc OCR?

Is Doc OCR free?

Which platforms does Doc OCR support?

Who created Doc OCR?

💬 Comments