← Back to Skills Marketplace
mzlzyca

HTML to HTML

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ Security Clean
167
Downloads
0
Stars
0
Active Installs
5
Versions
Install in OpenClaw
/install html-to-html
Description
Clean and restructure HTML documents using MinerU. Takes messy or complex HTML and produces clean, well-formatted HTML output with proper structure preserved...
README (SKILL.md)

HTML to HTML

Fetch a remote web page or local HTML file and convert it to clean structured HTML using MinerU. Strips noise and preserves semantic content.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Crawl a web page and output clean HTML (requires token)
mineru-open-api crawl https://example.com/article -f html -o ./out/

# Re-extract a local HTML file to clean HTML (requires token)
mineru-open-api extract page.html -f html -o ./out/

# Batch crawl multiple URLs to HTML (requires token)
mineru-open-api crawl url1 url2 -f html -o ./pages/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Input: remote web page URL or local .html file
  • Output: clean structured HTML (-f html)
  • For remote URLs: use crawl -f html
  • For local HTML files: use extract -f html
  • Requires token — not available in flash-extract

Notes

  • HTML output (-f html) requires token; not available in flash-extract
  • crawl supports output formats: md, html, json
  • extract supports output formats: md, html, latex, docx, json
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill appears coherent: it runs the mineru-open-api CLI and needs a MINERU_TOKEN from mineru.net. Before installing, verify the npm package and GitHub repo are legitimate (check publisher, recent commits, and npm download counts). Treat MINERU_TOKEN like any API credential: only provide a token with the minimal needed scopes, avoid using it with highly sensitive local HTML unless you accept sending content to the MinerU service, and rotate/delete the token if you stop using the skill.
Capability Analysis
Type: OpenClaw Skill Name: html-to-html Version: 0.4.0 The html-to-html skill is a legitimate wrapper for the MinerU document intelligence engine (by OpenDataLab). It facilitates cleaning and restructuring HTML via the 'mineru-open-api' CLI tool. The SKILL.md file contains standard installation instructions (npm/go) and usage examples for crawling URLs or extracting local files. It requires a MINERU_TOKEN for authentication but shows no signs of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description (HTML cleanup via MinerU) align with required binary (mineru-open-api) and required env var (MINERU_TOKEN). The primary credential and declared binaries are exactly what the CLI needs to function.
Instruction Scope
SKILL.md only instructs the agent to run mineru-open-api commands against remote URLs or local HTML files, use the auth flow, and write output to stdout or files. It does not ask the agent to read unrelated system files, other credentials, or post data to unexpected endpoints beyond MinerU's API.
Install Mechanism
Installation options are standard package installs (npm package and Go install from a GitHub repo). These are expected for a CLI; no arbitrary download URLs, extract steps, or personal servers are used.
Credentials
Only MINERU_TOKEN is required and declared as the primary credential, which is proportionate for a hosted extraction/processing service. No unrelated secrets or config paths are requested.
Persistence & Privilege
Skill is not forced-always; it is user-invocable and does not request elevated persistent presence or modifications to other skills or system-wide configs.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install html-to-html
  3. After installation, invoke the skill by name or use /html-to-html
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization v0.2.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
HTML to HTML - fetch a remote HTML page (URL) and convert it to clean structured HTML using MinerU c
Metadata
Slug html-to-html
Version 0.4.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 5
Frequently Asked Questions

What is HTML to HTML?

Clean and restructure HTML documents using MinerU. Takes messy or complex HTML and produces clean, well-formatted HTML output with proper structure preserved... It is an AI Agent Skill for Claude Code / OpenClaw, with 167 downloads so far.

How do I install HTML to HTML?

Run "/install html-to-html" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is HTML to HTML free?

Yes, HTML to HTML is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does HTML to HTML support?

HTML to HTML is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created HTML to HTML?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments