← 返回 Skills 市场
247
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install anti-bot-scraper
功能描述
基于 Playwright 的反爬虫网页抓取技能。支持普通模式、隐身模式和批量模式。内置反检测技术(隐藏 webdriver、随机 UA、Canvas/WebGL 指纹防护)绕过常见反爬虫机制。使用场景:抓取网页内容、反爬虫抓取、批量采集、截图保存。触发关键词:scrape, 抓取, 爬虫, 反爬, stealt...
使用说明 (SKILL.md)
Stealth Scraper
基于 Playwright 的反爬虫网页抓取技能,支持普通模式、隐身模式和批量模式。
Description
强大的网页抓取工具,内置反检测技术,绕过常见反爬虫机制。纯手写反检测代码,不依赖任何第三方 stealth 插件。
核心能力:
- 🕵️ 隐身模式:隐藏 webdriver 指纹、随机 UA、随机延迟、禁用指纹追踪
- 📄 普通模式:快速抓取页面内容
- 📦 批量模式:并发抓取多个 URL,自动限速
- 🎯 精确提取:CSS 选择器定向抓取
- 📸 截图 & HTML 保存
Configuration
bins: ["node"]
Setup
首次使用前需要安装依赖:
cd ~/.openclaw/workspace/skills/stealth-scraper
npm install
如果 Chromium 未自动安装,运行:
node scripts/setup.js
Usage
普通模式(快速抓取)
node scripts/scraper-simple.js \x3CURL> [options]
参数:
| 参数 | 说明 | 默认值 |
|---|---|---|
--wait \x3Cms> |
页面加载后等待时间(毫秒) | 2000 |
--selector \x3Ccss> |
CSS 选择器,只提取匹配元素的内容 | 无(提取全部) |
示例:
# 基本抓取
node scripts/scraper-simple.js https://example.com
# 等待 5 秒后提取
node scripts/scraper-simple.js https://example.com --wait 5000
# 只提取文章正文
node scripts/scraper-simple.js https://example.com --selector "article.content"
隐身模式(反爬虫)
node scripts/scraper-stealth.js \x3CURL> [options]
参数:
| 参数 | 说明 | 默认值 |
|---|---|---|
--wait \x3Cms> |
页面加载后等待时间(毫秒) | 2000 |
--selector \x3Ccss> |
CSS 选择器精确提取 | 无 |
--proxy \x3Curl> |
代理服务器地址 | 无 |
--screenshot \x3Cpath> |
保存截图到指定路径 | 无 |
--html \x3Cpath> |
保存完整 HTML 到指定路径 | 无 |
--cookie \x3Cjson> |
自定义 cookie(JSON 格式) | 无 |
--scroll |
自动滚动页面加载懒加载内容 | false |
示例:
# 隐身抓取
node scripts/scraper-stealth.js https://example.com
# 使用代理 + 截图
node scripts/scraper-stealth.js https://example.com --proxy http://127.0.0.1:7890 --screenshot ./shot.png
# 自动滚动 + 保存 HTML
node scripts/scraper-stealth.js https://example.com --scroll --html ./page.html
# 带 cookie 访问
node scripts/scraper-stealth.js https://example.com --cookie '[{"name":"token","value":"abc123","domain":".example.com"}]'
# 精确提取 + 等待
node scripts/scraper-stealth.js https://example.com --selector "div.main" --wait 5000
批量模式
node scripts/scraper-batch.js [options] \x3CURL1> \x3CURL2> ...
# 或
node scripts/scraper-batch.js --file urls.txt [options]
参数:
| 参数 | 说明 | 默认值 |
|---|---|---|
--file \x3Cpath> |
URL 列表文件(每行一个 URL) | 无 |
--concurrency \x3Cn> |
并发数 | 3 |
--stealth |
使用隐身模式 | false |
--wait \x3Cms> |
每个页面的等待时间 | 2000 |
--selector \x3Ccss> |
CSS 选择器 | 无 |
--output \x3Cpath> |
输出 JSON 文件路径 | stdout |
示例:
# 批量抓取多个 URL
node scripts/scraper-batch.js https://a.com https://b.com https://c.com
# 从文件读取 URL 列表,隐身模式
node scripts/scraper-batch.js --file urls.txt --stealth --concurrency 2
# 输出到文件
node scripts/scraper-batch.js --file urls.txt --output results.json
Output Format
所有模式输出统一的 JSON 结构:
{
"success": true,
"url": "https://example.com",
"title": "Example Domain",
"content": "页面文本内容...",
"links": [{"text": "More info", "href": "https://..."}],
"images": [{"alt": "Logo", "src": "https://..."}],
"metadata": {
"description": "...",
"keywords": "...",
"author": "..."
},
"elapsedSeconds": 2.35
}
Anti-Detection Features
| 技术 | 说明 |
|---|---|
| navigator.webdriver 隐藏 | 删除 webdriver 属性,伪装为真实浏览器 |
| User-Agent 轮换 | 10+ 真实 UA,涵盖 Chrome/Safari/Firefox,桌面+移动端 |
| 随机延迟 | 1-3 秒随机等待,模拟人类浏览行为 |
| 随机视口 | 随机分辨率,避免固定窗口指纹 |
| WebGL 指纹防护 | 注入噪声干扰 WebGL 指纹采集 |
| Canvas 指纹防护 | 对 Canvas 数据添加微小噪声 |
| 插件伪装 | 伪造 navigator.plugins 数组 |
| 语言伪装 | 伪造 navigator.languages |
| 硬件并发伪装 | 随机化 navigator.hardwareConcurrency |
Notes
- 纯手写反检测代码,不使用任何第三方 stealth 插件(如 puppeteer-extra-plugin-stealth),避免被反检测系统识别
- 使用 Playwright 而非 Puppeteer,因为 Playwright 的反检测基础更好
- 所有反检测代码通过
addInitScript在页面加载前注入 - 请遵守目标网站的 robots.txt 和服务条款
安全使用建议
This package appears to be what it says: a Playwright-based stealth scraper. Before installing or running it, consider the following: (1) install in a controlled/sandbox environment (container or VM) because npm install and npx playwright install chromium will download and install packages and a browser binary to disk; (2) review the remainder of the stealth injection code (the SKILL listing was truncated) to ensure there are no unexpected network callbacks or telemetry; (3) be aware that fingerprint-evasion code is sensitive and may violate target sites' terms of service or local law—use responsibly and ethically; (4) verify package sources (npm registry/mirror) and run 'npm audit' if possible; (5) when running batch jobs, avoid providing sensitive credentials via command-line/cookie arguments and monitor outgoing network traffic (proxy usage) to ensure no data exfiltration. If you need higher assurance, request a full untruncated review of scripts/scraper-stealth.js and monitor the first npm install in an isolated environment.
功能分析
Type: OpenClaw Skill
Name: anti-bot-scraper
Version: 1.0.0
The skill bundle is a legitimate web scraping toolset using Playwright, featuring simple, stealth, and batch processing modes. The 'stealth' functionality in `scripts/scraper-stealth.js` implements standard anti-fingerprinting techniques (spoofing User-Agents, navigator properties, and adding Canvas/WebGL noise) to bypass bot detection on target websites. The code is well-structured, lacks obfuscation, and contains no evidence of data exfiltration, malicious command execution, or prompt injection.
能力评估
Purpose & Capability
Name/description (anti-bot Playwright scraper) match the included files and runtime behavior: scripts implement simple, stealth, and batch scraping, with UA/viewports/fingerprint tweaks and optional proxy/cookie input. There are no unrelated credentials, binaries, or config paths requested.
Instruction Scope
SKILL.md instructs the agent/user to run npm install, optionally run scripts/setup.js which itself may run npm install and npx playwright install chromium, and to execute the provided node scripts. The scripts read URL lists, accept proxy/cookie inputs, and write screenshots/HTML/JSON output to disk — all consistent with scraping functionality. One caveat: the provided stealth injection code (truncated in the package listing) modifies many browser-exposed APIs to evade detection; this is expected for the stated purpose but is also sensitive behavior (fingerprint evasion).
Install Mechanism
No platform install spec is present, but package.json and package-lock.json require npm install and the postinstall runs 'npx playwright install chromium' (plus scripts/setup.js also runs npm and npx). This will download packages and browser binaries (Playwright/Chromium) from registries/mirrors. This is expected for a Playwright-based tool but is a higher-risk install action than an instruction-only skill because it writes binaries to disk and runs lifecycle scripts.
Credentials
The skill requests no environment variables or credentials. Command-line options accept proxy URLs and cookie JSON (user-supplied); that is appropriate for a scraper. There are no hidden env accesses in the visible code. No broad credential access or unrelated env vars are requested.
Persistence & Privilege
Skill flags are default (always: false, user-invocable true) and it does not request permanent agent presence or modify other skills' configs. It does install local dependencies and browser binaries into the user's environment when npm install / npx playwright runs, which is normal for Playwright tools but should be run with user consent.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install anti-bot-scraper - 安装完成后,直接呼叫该 Skill 的名称或使用
/anti-bot-scraper触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
v1.0.0: 11 anti-detection techniques, batch mode, CSS selector, proxy support. Tested on Xiaohongshu.
元数据
常见问题
Anti-Bot Scraper 是什么?
基于 Playwright 的反爬虫网页抓取技能。支持普通模式、隐身模式和批量模式。内置反检测技术(隐藏 webdriver、随机 UA、Canvas/WebGL 指纹防护)绕过常见反爬虫机制。使用场景:抓取网页内容、反爬虫抓取、批量采集、截图保存。触发关键词:scrape, 抓取, 爬虫, 反爬, stealt... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 247 次。
如何安装 Anti-Bot Scraper?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install anti-bot-scraper」即可一键安装,无需额外配置。
Anti-Bot Scraper 是免费的吗?
是的,Anti-Bot Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Anti-Bot Scraper 支持哪些平台?
Anti-Bot Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Anti-Bot Scraper?
由 tttyix(@tttyix)开发并维护,当前版本 v1.0.0。
推荐 Skills