← Back to Skills Marketplace

HTML Extract

Name: HTML Extract
Author: mzlzyca

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0

cross-platform ✓ Security Clean

176

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install html-extract

Description

Extract content from HTML pages and files using MinerU. Converts HTML to clean, structured Markdown preserving headings, lists, tables, and text hierarchy. F...

README (SKILL.md)

HTML Extract

Extract text and content from local HTML files to Markdown using MinerU. For live web page URLs, use mineru-open-api crawl.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract from a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/

# Extract from a remote HTML URL (requires token)
mineru-open-api extract https://example.com/page.html -o ./out/

# Extract web page content via crawl (requires token)
mineru-open-api crawl https://example.com/article -o ./out/

# With language hint
mineru-open-api extract page.html --language en -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

Supported input: local .html file or remote HTML URL
HTML requires extract (token required) — not supported by flash-extract
For live web pages, use mineru-open-api crawl \x3CURL> (also requires token)
Language hint with --language (default: ch, use en for English)

Notes

HTML is NOT supported by flash-extract — always use extract or crawl
Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
All progress/status messages go to stderr; document content goes to stdout
MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Usage Guidance

This skill is internally consistent with its stated purpose, but you should verify the mineru-open-api package before installing: check the npm package page and the GitHub repo linked from the MinerU homepage (https://mineru.net / https://github.com/opendatalab). Treat MINERU_TOKEN as a secret (do not reuse highly privileged credentials), create a token with least privilege if possible, and rotate it if you later stop using the skill. If you're cautious, install the CLI in an isolated environment (container or VM) and inspect its behavior (requests it makes) before using with sensitive data.

Capability Analysis

Type: OpenClaw Skill Name: html-extract Version: 0.4.0 The skill provides instructions and metadata for using the MinerU document intelligence engine (by Shanghai AI Lab) to convert HTML content into Markdown. It utilizes the legitimate 'mineru-open-api' CLI tool, requires a standard API token (MINERU_TOKEN), and points to official project resources on GitHub and mineru.net. No malicious code, obfuscation, or suspicious prompt-injection attempts were found in SKILL.md or the associated metadata.

Capability Assessment

✓ Purpose & Capability

The name/description (HTML extraction via MinerU) align with the declared runtime requirement (mineru-open-api) and the single required env var (MINERU_TOKEN). Requiring a MinerU CLI and token is expected for this functionality.

✓ Instruction Scope

SKILL.md contains explicit commands using mineru-open-api (extract, crawl) and only references local HTML files, URLs, and the MINERU_TOKEN. It does not instruct reading unrelated system files, other environment variables, or exfiltrating data to unexpected endpoints.

ℹ Install Mechanism

Installers are npm (mineru-open-api) and go install from the GitHub repo — these are standard package sources. Installing third-party packages runs remote code at install/runtime, so verify the npm package and GitHub repository are the legitimate MinerU project before installing.

✓ Credentials

Only one credential (MINERU_TOKEN) is required and is declared as primaryEnv. This is proportionate to a CLI that calls a remote MinerU API. No unrelated secrets or broad filesystem config paths are requested.

✓ Persistence & Privilege

The skill does not request always:true or other elevated persistence. It is user-invocable and allows normal autonomous invocation, which is the platform default and reasonable for this capability.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install html-extract
After installation, invoke the skill by name or use /html-extract
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.4.0

SEO: expand description for better ClawHub vector search discovery

v0.3.0

Rollback to original version

v0.2.0

SEO optimization v0.2.0

v1.0.1

Fix: declare MINERU_TOKEN credential in metadata

v1.0.0

HTML Extract - extract text and content from local HTML files to Markdown using MinerU. For live web

Metadata

Slug html-extract

Version 0.4.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 5

Frequently Asked Questions

What is HTML Extract?

Extract content from HTML pages and files using MinerU. Converts HTML to clean, structured Markdown preserving headings, lists, tables, and text hierarchy. F... It is an AI Agent Skill for Claude Code / OpenClaw, with 176 downloads so far.

How do I install HTML Extract?

Run "/install html-extract" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is HTML Extract free?

Yes, HTML Extract is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does HTML Extract support?

HTML Extract is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created HTML Extract?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

More Skills

HTML Extract

HTML Extract

Install

Quick Start

Authentication

Capabilities

Notes

What is HTML Extract?

How do I install HTML Extract?

Is HTML Extract free?

Which platforms does HTML Extract support?

Who created HTML Extract?

💬 Comments