← 返回 Skills 市场
841
总下载
1
收藏
5
当前安装
1
版本数
在 OpenClaw 中安装
/install web-anti-crawl-fetch
功能描述
使用带 Stealth 插件的无头浏览器抓取网页内容并转换为 Markdown。用于当需要获取特定网页的正文、新闻详情、公司财报或其他长篇网页内容时。支持绕过大多数基础反爬虫检测。
使用说明 (SKILL.md)
Web Fetch Skill
使用无头浏览器(Playwright + Stealth Plugin)抓取指定 URL 的网页内容,并自动转换为 Markdown 格式以便于阅读和进一步处理。
主要特性
- 反爬虫绕过: 集成了
playwright-extra和puppeteer-extra-plugin-stealth,自动处理各种浏览器指纹和自动化特征检测。 - 内容转换: 使用
turndown库将复杂的 HTML 页面转换为简洁的 Markdown 格式。 - 环境模拟: 模拟真实用户视口大小和无头浏览器配置。
使用方法
运行抓取脚本:
cd /Users/wuwei/.openclaw/workspace/skills/web-fetch/scripts
node fetch.js \x3Curl>
参数说明
url: 需要抓取的完整网页 URL(包括 http/https)。
示例
# 抓取新浪财经
node fetch.js "https://finance.sina.com.cn/stock/"
# 抓取特定新闻页面
node fetch.js "https://finance.eastmoney.com/a/202403143012345678.html"
输出
脚本将会在控制台输出以下内容:
- 抓取进度说明。
- 页面标题。
- 转换后的 Markdown 正文内容(较长内容会截断)。
依赖
- playwright-extra: 插件化 Playwright 核心。
- puppeteer-extra-plugin-stealth: 提供各种 evasion 策略。
- turndown: HTML 到 Markdown 转换服务。
安装依赖:
cd /Users/wuwei/.openclaw/workspace/skills/web-fetch/scripts
npm install
安全使用建议
This skill appears to do what it says: run a headless Playwright browser with stealth evasion, fetch a page, strip scripts/iframes/ads, convert HTML to Markdown, and print it. Before installing: 1) be aware it includes stealth evasion code—using it may violate website terms of service or local law; only use on sites you are allowed to scrape. 2) npm install will download Playwright and browser binaries (large network downloads); run it in an isolated environment (container or VM) if you want to limit risk. 3) SKILL.md examples use a hard-coded local path—verify where you run the commands and adjust paths accordingly. 4) If you need stronger assurance, inspect package.json versions and run the code in a sandbox to confirm no unexpected network calls beyond visiting the target URL. If any additional metadata (an explicit install spec, source homepage, or maintainer contact) becomes available, re-evaluate to increase confidence.
功能分析
Type: OpenClaw Skill
Name: web-anti-crawl-fetch
Version: 1.0.0
The skill is a standard web scraping utility that uses Playwright with a stealth plugin to bypass anti-bot measures and convert HTML content into Markdown via the Turndown library. The code in `scripts/fetch.js` and instructions in `SKILL.md` align perfectly with the stated purpose, showing no signs of data exfiltration, credential theft, or unauthorized system access.
能力评估
Purpose & Capability
The name/description (stealth headless scraping and HTML→Markdown conversion) align with the included code and package.json dependencies (playwright, playwright-extra, puppeteer-extra-plugin-stealth, turndown). Minor inconsistency: registry metadata said 'instruction-only' / no install spec, but the repo contains a package.json and a runnable script—so the skill requires installing Node deps even though there's no formal install spec in the registry.
Instruction Scope
SKILL.md instructs running the provided Node script and to npm install the listed dependencies. The runtime instructions only open the target URL in a headless browser, remove certain DOM elements, convert the body HTML to Markdown, and print to console. The only oddity is an absolute example path (/Users/wuwei/.openclaw/...) which is a local author path and not required for functionality.
Install Mechanism
There is no formal install spec in the registry (highest-signal install section is empty), but the package.json declares npm dependencies. Installing will pull Playwright and browser binaries (large downloads from official hosts). This is expected for a browser-based scraper but the lack of an explicit install spec in the skill metadata is a packaging/UX inconsistency.
Credentials
The skill requests no environment variables, no credentials, and references no config paths. The code does not read other env vars or credentials—its network access is limited to visiting the target URL(s) supplied by the caller.
Persistence & Privilege
always:false and standard model invocation settings. The skill does not request persistent system-wide privileges or modify other skills. Autonomous invocation is allowed (platform default) but not combined with other concerning flags.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install web-anti-crawl-fetch - 安装完成后,直接呼叫该 Skill 的名称或使用
/web-anti-crawl-fetch触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of the web-fetch skill.
- Fetches webpage content using a stealth-enabled headless browser.
- Converts captured HTML content into readable Markdown format.
- Bypasses most basic anti-crawling protections.
- Supports user simulation and outputs progress, page title, and Markdown to the console.
元数据
常见问题
web-fetch 是什么?
使用带 Stealth 插件的无头浏览器抓取网页内容并转换为 Markdown。用于当需要获取特定网页的正文、新闻详情、公司财报或其他长篇网页内容时。支持绕过大多数基础反爬虫检测。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 841 次。
如何安装 web-fetch?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install web-anti-crawl-fetch」即可一键安装,无需额外配置。
web-fetch 是免费的吗?
是的,web-fetch 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
web-fetch 支持哪些平台?
web-fetch 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 web-fetch?
由 wei.wu(@dlutwuwei)开发并维护,当前版本 v1.0.0。
推荐 Skills