← Back to Skills Marketplace
jllyzzd2023

clean-web-fetch

by 晨冬 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
795
Downloads
0
Stars
4
Active Installs
1
Versions
Install in OpenClaw
/install clean-web-fetch
Description
获取干净、可读的现代网页正文内容,支持微信公众号文章抓取与尾部噪音清洗,减少无用信息与 token 消耗;适合新闻、博客、公告及许多普通 fetch 不稳定、存在反爬或动态渲染干扰的网页。Clean readable web fetch for modern pages, with WeChat cleanup,...
README (SKILL.md)

Scrapling Web Fetch

当用户要获取网页内容、正文提取、把网页转成 markdown/text、抓取文章主体时,优先使用此技能。

默认流程

  1. 使用 python3 scripts/scrapling_fetch.py \x3Curl> \x3Cmax_chars>
  2. 默认正文选择器优先级:
    • article
    • main
    • .post-content
    • [class*="body"]
  3. 命中正文后,使用 html2text 转 Markdown
  4. 若都未命中,回退到 body
  5. 最终按 max_chars 截断输出

用法

python3 /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py \x3Curl> 30000

依赖

优先检查:

  • scrapling
  • html2text

若缺失,可安装:

python3 -m pip install scrapling html2text

输出约定

脚本默认输出 Markdown 正文内容。 如需结构化输出,可追加 --json。 如需调试提取命中了哪个 selector,可查看 stderr 输出。

附加资源

  • 用法参考:/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/usage.md
  • 选择器策略:/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/selectors.md
  • 统一入口:/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/fetch-web-content

何时用这个技能

  • 获取文章正文
  • 抓博客/新闻/公告正文
  • 将网页转成 Markdown 供后续总结
  • 常规 fetch 效果差,希望提升现代网页抓取稳定性

何时不用

  • 需要完整浏览器交互、点击、登录、翻页时:改用浏览器自动化
  • 只是简单获取 API JSON:直接请求 API 更合适
Usage Guidance
Do not install or run this skill as-is. The SKILL.md expects you to run a local Python script and points to absolute paths under /Users/zzd that are not included in the package — this may be a leftover from the developer's environment. Before using: (1) ask the publisher to provide the actual scripts and any referenced 'references' files (or include them in the skill bundle); (2) inspect those Python scripts manually to verify they only fetch and parse the target URL and do not read unrelated files or exfiltrate data; (3) prefer vetted install instructions (e.g., a packaged script or a container) and check the pip packages (scrapling/html2text) on PyPI to confirm they are legitimate; (4) run the tool in a sandbox environment the first time. If the author cannot provide the scripts or a clear explanation for the absolute paths, treat the skill as untrusted.
Capability Analysis
Type: OpenClaw Skill Name: clean-web-fetch Version: 1.0.0 The skill bundle describes a utility for fetching and cleaning web content using the 'scrapling' and 'html2text' libraries. The instructions in SKILL.md outline a standard workflow for converting web pages to Markdown and include typical dependency installation steps; no evidence of malicious intent, data exfiltration, or harmful prompt injection was found in the provided documentation.
Capability Assessment
Purpose & Capability
The name/description describe a web-page-to-markdown fetcher which is coherent. However, the skill declares no code, no install, and no environment requirements, yet the SKILL.md instructs running a local Python script (scripts/scrapling_fetch.py) that is not included in the package. That mismatch (declared nothing vs. instructions requiring local files) is inconsistent.
Instruction Scope
The instructions tell the agent to execute a Python script at absolute/user-specific paths (/Users/zzd/.openclaw/...) and reference local 'references' files. Those paths are outside the declared scope and would cause the agent to access arbitrary local files if present. The SKILL.md also allows installing Python packages, but the primary runtime behavior depends on running an external script that is not bundled or verified here.
Install Mechanism
No install spec is provided (instruction-only), which reduces installer risk. The SKILL.md suggests installing pip packages (scrapling and html2text) if missing — this is normal for a Python-based fetcher, but the pip package 'scrapling' is referenced without verification and could be any third-party package.
Credentials
The skill declares no required env vars or config paths, yet the instructions reference absolute local filesystem paths under a specific user's home. That is inconsistent: the instructions implicitly require access to those local files. No credentials are requested, but the implicit filesystem access is disproportionate to the package metadata.
Persistence & Privilege
The skill is not marked 'always: true' and does not request persistent privileges. It is user-invocable and can be run autonomously (default), which is normal. There is no evidence it modifies other skills or system-wide settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install clean-web-fetch
  3. After installation, invoke the skill by name or use /clean-web-fetch
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial publish
Metadata
Slug clean-web-fetch
Version 1.0.0
License MIT-0
All-time Installs 4
Active Installs 4
Total Versions 1
Frequently Asked Questions

What is clean-web-fetch?

获取干净、可读的现代网页正文内容,支持微信公众号文章抓取与尾部噪音清洗,减少无用信息与 token 消耗;适合新闻、博客、公告及许多普通 fetch 不稳定、存在反爬或动态渲染干扰的网页。Clean readable web fetch for modern pages, with WeChat cleanup,... It is an AI Agent Skill for Claude Code / OpenClaw, with 795 downloads so far.

How do I install clean-web-fetch?

Run "/install clean-web-fetch" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is clean-web-fetch free?

Yes, clean-web-fetch is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does clean-web-fetch support?

clean-web-fetch is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created clean-web-fetch?

It is built and maintained by 晨冬 (@jllyzzd2023); the current version is v1.0.0.

💬 Comments