← Back to Skills Marketplace
mzlzyca

HTML to Text

by mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ Security Clean
159
Downloads
0
Stars
0
Active Installs
5
Versions
Install in OpenClaw
/install html-to-text
Description
Convert HTML to plain readable text using MinerU. Strips HTML markup and extracts clean text content from web pages and HTML files. Features: HTML to text co...
README (SKILL.md)

HTML to Text

Extract plain readable text from HTML files or web pages using MinerU. MinerU outputs Markdown as the closest format to plain text.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract text from a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/

# Extract text from a web page (requires token)
mineru-open-api crawl https://example.com/article

# JSON output contains text fields (requires token)
mineru-open-api extract page.html -f json -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: local .html file or web page URL
  • HTML requires extract or crawl (token required) — not supported by flash-extract
  • MinerU does not have a -f text option; Markdown is the closest plain-text output
  • For truly plain text: use extract -f json and read the text fields from JSON output
  • Language hint with --language (default: ch, use en for English)

Notes

  • MinerU has no -f text format; use Markdown output or -f json for text fields
  • HTML is NOT supported by flash-extract
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Usage Guidance
This skill is coherent: it simply wraps the MinerU CLI and needs a MinerU API token. Before installing, verify the mineru-open-api package source (npm package name and the GitHub repo) to ensure it's the official project, obtain your MINERU_TOKEN only from the official mineru.net site, and avoid pasting that token into untrusted places. Installing globally (-g) will add a system-wide binary; use a virtualenv/container if you prefer isolation. If you plan to allow the agent to call this skill autonomously, be aware it can run the mineru-open-api commands whenever invoked — ensure you trust the agent and the token's permissions.
Capability Analysis
Type: OpenClaw Skill Name: html-to-text Version: 0.4.0 The skill provides instructions and metadata for using the MinerU API (developed by OpenDataLab) to convert HTML content into plain text or Markdown. It utilizes the 'mineru-open-api' CLI tool and requires a 'MINERU_TOKEN' for authentication. All documented behaviors in SKILL.md and _meta.json are consistent with the stated purpose of document processing, and no malicious patterns, unauthorized data access, or harmful prompt injections were identified.
Capability Assessment
Purpose & Capability
The skill is an instruction-only wrapper to run the mineru-open-api CLI to extract text from HTML/URLs. Requiring the mineru-open-api binary and MINERU_TOKEN is consistent with that purpose.
Instruction Scope
SKILL.md only instructs using mineru-open-api commands (extract, crawl, auth), creating/setting MINERU_TOKEN, and saving outputs. It does not ask the agent to read unrelated files, other env vars, or exfiltrate data to unexpected endpoints.
Install Mechanism
Installers are standard package flows: npm package and go install from a GitHub repo. No arbitrary downloads, no URL shorteners or unknown extract steps are used.
Credentials
Only MINERU_TOKEN is required and declared as the primary credential. That single token is proportional to a CLI that authenticates to MinerU's API.
Persistence & Privilege
always is false and the skill does not request system-wide changes or other skills' config. Autonomous invocation is allowed (platform default) but not excessive for this integration.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install html-to-text
  3. After installation, invoke the skill by name or use /html-to-text
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization v0.2.0
v1.0.1
Minor update
v1.0.0
Initial release
Metadata
Slug html-to-text
Version 0.4.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 5
Frequently Asked Questions

What is HTML to Text?

Convert HTML to plain readable text using MinerU. Strips HTML markup and extracts clean text content from web pages and HTML files. Features: HTML to text co... It is an AI Agent Skill for Claude Code / OpenClaw, with 159 downloads so far.

How do I install HTML to Text?

Run "/install html-to-text" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is HTML to Text free?

Yes, HTML to Text is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does HTML to Text support?

HTML to Text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created HTML to Text?

It is built and maintained by mzlzyCA (@mzlzyca); the current version is v0.4.0.

💬 Comments