/install clean-web-fetch
Scrapling Web Fetch
当用户要获取网页内容、正文提取、把网页转成 markdown/text、抓取文章主体时,优先使用此技能。
默认流程
- 使用
python3 scripts/scrapling_fetch.py \x3Curl> \x3Cmax_chars> - 默认正文选择器优先级:
articlemain.post-content[class*="body"]
- 命中正文后,使用
html2text转 Markdown - 若都未命中,回退到
body - 最终按
max_chars截断输出
用法
python3 /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py \x3Curl> 30000
依赖
优先检查:
scraplinghtml2text
若缺失,可安装:
python3 -m pip install scrapling html2text
输出约定
脚本默认输出 Markdown 正文内容。
如需结构化输出,可追加 --json。
如需调试提取命中了哪个 selector,可查看 stderr 输出。
附加资源
- 用法参考:
/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/usage.md - 选择器策略:
/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/selectors.md - 统一入口:
/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/fetch-web-content
何时用这个技能
- 获取文章正文
- 抓博客/新闻/公告正文
- 将网页转成 Markdown 供后续总结
- 常规 fetch 效果差,希望提升现代网页抓取稳定性
何时不用
- 需要完整浏览器交互、点击、登录、翻页时:改用浏览器自动化
- 只是简单获取 API JSON:直接请求 API 更合适
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install clean-web-fetch - After installation, invoke the skill by name or use
/clean-web-fetch - Provide required inputs per the skill's parameter spec and get structured output
What is clean-web-fetch?
获取干净、可读的现代网页正文内容,支持微信公众号文章抓取与尾部噪音清洗,减少无用信息与 token 消耗;适合新闻、博客、公告及许多普通 fetch 不稳定、存在反爬或动态渲染干扰的网页。Clean readable web fetch for modern pages, with WeChat cleanup,... It is an AI Agent Skill for Claude Code / OpenClaw, with 795 downloads so far.
How do I install clean-web-fetch?
Run "/install clean-web-fetch" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is clean-web-fetch free?
Yes, clean-web-fetch is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does clean-web-fetch support?
clean-web-fetch is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created clean-web-fetch?
It is built and maintained by 晨冬 (@jllyzzd2023); the current version is v1.0.0.