← Back to Skills Marketplace
mzlzyca

HTML Parse

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ Security Clean
190
Downloads
0
Stars
1
Active Installs
5
Versions
Install in OpenClaw
/install html-parse
Description
Parse HTML documents into structured Markdown using MinerU. Analyzes HTML structure and converts it into well-organized Markdown preserving hierarchy and for...
README (SKILL.md)

HTML Parse

Parse local HTML files into structured Markdown using MinerU. Preserves document hierarchy. For live web pages, use mineru-open-api crawl.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Parse a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/

# Parse a remote HTML URL (requires token)
mineru-open-api extract https://example.com/page.html -o ./out/

# Parse a live web page (requires token)
mineru-open-api crawl https://example.com/article -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: local .html file or remote HTML URL
  • HTML requires extract or crawl (token required)
  • HTML is NOT supported by flash-extract
  • Language hint with --language (default: ch, use en for English)

Notes

  • HTML is NOT supported by flash-extract — use extract or crawl
  • For live web pages with dynamic content, use crawl instead of extract
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill is coherent: it delegates HTML parsing to the MinerU CLI and requires only the MINERU_TOKEN. Before installing, verify the mineru-open-api package's provenance (npm page or the GitHub repo) and trustworthiness of mineru.net. Keep your MINERU_TOKEN secret, review token permissions, and avoid parsing sensitive or private HTML unless you accept that content will be sent to MinerU's service and may incur charges or be stored by that service. If you prefer more control, consider running an audited local parser instead of a remote API.
Capability Analysis
Type: OpenClaw Skill Name: html-parse Version: 0.4.0 The skill provides a legitimate interface for the MinerU HTML parsing service developed by OpenDataLab (Shanghai AI Lab). It requires the 'mineru-open-api' CLI and a 'MINERU_TOKEN' for authentication, which is standard for this API-based tool. The instructions in SKILL.md are consistent with the tool's purpose, and no evidence of malicious behavior, data exfiltration, or harmful prompt injection was found.
Capability Assessment
Purpose & Capability
Name/description (HTML → structured Markdown) match the declared binary (mineru-open-api) and the single required env var (MINERU_TOKEN). The requested binaries and token are what this CLI-based parsing workflow would legitimately need.
Instruction Scope
SKILL.md only instructs using mineru-open-api commands (extract, crawl, auth), installing the CLI, and setting MINERU_TOKEN. It does not direct the agent to read unrelated files or credentials, nor to exfiltrate data to unexpected endpoints.
Install Mechanism
Install options are standard package sources: npm package 'mineru-open-api' and a Go install from github.com/opendatalab. No arbitrary download URLs or extract-from-unknown-host steps are present.
Credentials
Only MINERU_TOKEN is required (declared as primary credential), which is proportionate. Caution: using the skill will send HTML content to MinerU's service (remote API), so the token grants API access and may expose uploaded content or incur usage costs—avoid sending sensitive documents unless you trust MinerU and the token's permissions.
Persistence & Privilege
Skill is not always-enabled and uses normal agent invocation. It does not request persistent system-wide changes or access other skills' config.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install html-parse
  3. After installation, invoke the skill by name or use /html-parse
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization v0.2.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
HTML Parse - parse local HTML files into structured Markdown using MinerU. Preserves document hierar
Metadata
Slug html-parse
Version 0.4.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 5
Frequently Asked Questions

What is HTML Parse?

Parse HTML documents into structured Markdown using MinerU. Analyzes HTML structure and converts it into well-organized Markdown preserving hierarchy and for... It is an AI Agent Skill for Claude Code / OpenClaw, with 190 downloads so far.

How do I install HTML Parse?

Run "/install html-parse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is HTML Parse free?

Yes, HTML Parse is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does HTML Parse support?

HTML Parse is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created HTML Parse?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments