/install html-extract
HTML Extract
Extract text and content from local HTML files to Markdown using MinerU. For live web page URLs, use mineru-open-api crawl.
Install
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
Quick Start
# Extract from a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/
# Extract from a remote HTML URL (requires token)
mineru-open-api extract https://example.com/page.html -o ./out/
# Extract web page content via crawl (requires token)
mineru-open-api crawl https://example.com/article -o ./out/
# With language hint
mineru-open-api extract page.html --language en -o ./out/
Authentication
Token required:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Create token at: https://mineru.net/apiManage/token
Capabilities
- Supported input: local .html file or remote HTML URL
- HTML requires
extract(token required) — not supported byflash-extract - For live web pages, use
mineru-open-api crawl \x3CURL>(also requires token) - Language hint with
--language(default:ch, useenfor English)
Notes
- HTML is NOT supported by
flash-extract— always useextractorcrawl - Output goes to stdout by default; use
-o \x3Cdir>to save to a file or directory - All progress/status messages go to stderr; document content goes to stdout
- MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install html-extract - After installation, invoke the skill by name or use
/html-extract - Provide required inputs per the skill's parameter spec and get structured output
What is HTML Extract?
Extract content from HTML pages and files using MinerU. Converts HTML to clean, structured Markdown preserving headings, lists, tables, and text hierarchy. F... It is an AI Agent Skill for Claude Code / OpenClaw, with 176 downloads so far.
How do I install HTML Extract?
Run "/install html-extract" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is HTML Extract free?
Yes, HTML Extract is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does HTML Extract support?
HTML Extract is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created HTML Extract?
It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.