← 返回 Skills 市场
mzlzyca

HTML Parse

作者 mzlzyCA · GitHub ↗ · v0.4.0 · MIT-0
cross-platform ✓ 安全检测通过
190
总下载
0
收藏
1
当前安装
5
版本数
在 OpenClaw 中安装
/install html-parse
功能描述
Parse HTML documents into structured Markdown using MinerU. Analyzes HTML structure and converts it into well-organized Markdown preserving hierarchy and for...
使用说明 (SKILL.md)

HTML Parse

Parse local HTML files into structured Markdown using MinerU. Preserves document hierarchy. For live web pages, use mineru-open-api crawl.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Parse a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/

# Parse a remote HTML URL (requires token)
mineru-open-api extract https://example.com/page.html -o ./out/

# Parse a live web page (requires token)
mineru-open-api crawl https://example.com/article -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: local .html file or remote HTML URL
  • HTML requires extract or crawl (token required)
  • HTML is NOT supported by flash-extract
  • Language hint with --language (default: ch, use en for English)

Notes

  • HTML is NOT supported by flash-extract — use extract or crawl
  • For live web pages with dynamic content, use crawl instead of extract
  • Output goes to stdout by default; use -o \x3Cdir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
安全使用建议
This skill is coherent: it delegates HTML parsing to the MinerU CLI and requires only the MINERU_TOKEN. Before installing, verify the mineru-open-api package's provenance (npm page or the GitHub repo) and trustworthiness of mineru.net. Keep your MINERU_TOKEN secret, review token permissions, and avoid parsing sensitive or private HTML unless you accept that content will be sent to MinerU's service and may incur charges or be stored by that service. If you prefer more control, consider running an audited local parser instead of a remote API.
功能分析
Type: OpenClaw Skill Name: html-parse Version: 0.4.0 The skill provides a legitimate interface for the MinerU HTML parsing service developed by OpenDataLab (Shanghai AI Lab). It requires the 'mineru-open-api' CLI and a 'MINERU_TOKEN' for authentication, which is standard for this API-based tool. The instructions in SKILL.md are consistent with the tool's purpose, and no evidence of malicious behavior, data exfiltration, or harmful prompt injection was found.
能力评估
Purpose & Capability
Name/description (HTML → structured Markdown) match the declared binary (mineru-open-api) and the single required env var (MINERU_TOKEN). The requested binaries and token are what this CLI-based parsing workflow would legitimately need.
Instruction Scope
SKILL.md only instructs using mineru-open-api commands (extract, crawl, auth), installing the CLI, and setting MINERU_TOKEN. It does not direct the agent to read unrelated files or credentials, nor to exfiltrate data to unexpected endpoints.
Install Mechanism
Install options are standard package sources: npm package 'mineru-open-api' and a Go install from github.com/opendatalab. No arbitrary download URLs or extract-from-unknown-host steps are present.
Credentials
Only MINERU_TOKEN is required (declared as primary credential), which is proportionate. Caution: using the skill will send HTML content to MinerU's service (remote API), so the token grants API access and may expose uploaded content or incur usage costs—avoid sending sensitive documents unless you trust MinerU and the token's permissions.
Persistence & Privilege
Skill is not always-enabled and uses normal agent invocation. It does not request persistent system-wide changes or access other skills' config.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install html-parse
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /html-parse 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.4.0
SEO: expand description for better ClawHub vector search discovery
v0.3.0
Rollback to original version
v0.2.0
SEO optimization v0.2.0
v1.0.1
Fix: declare MINERU_TOKEN credential in metadata
v1.0.0
HTML Parse - parse local HTML files into structured Markdown using MinerU. Preserves document hierar
元数据
Slug html-parse
版本 0.4.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 5
常见问题

HTML Parse 是什么?

Parse HTML documents into structured Markdown using MinerU. Analyzes HTML structure and converts it into well-organized Markdown preserving hierarchy and for... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 190 次。

如何安装 HTML Parse?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install html-parse」即可一键安装,无需额外配置。

HTML Parse 是免费的吗?

是的,HTML Parse 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

HTML Parse 支持哪些平台?

HTML Parse 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 HTML Parse?

由 mzlzyCA(@mzlzyca)开发并维护,当前版本 v0.4.0。

💬 留言讨论