← Back to Skills Marketplace
tttyix

Anti-Bot Scraper

by tttyix · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
247
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install anti-bot-scraper
Description
基于 Playwright 的反爬虫网页抓取技能。支持普通模式、隐身模式和批量模式。内置反检测技术(隐藏 webdriver、随机 UA、Canvas/WebGL 指纹防护)绕过常见反爬虫机制。使用场景:抓取网页内容、反爬虫抓取、批量采集、截图保存。触发关键词:scrape, 抓取, 爬虫, 反爬, stealt...
README (SKILL.md)

Stealth Scraper

基于 Playwright 的反爬虫网页抓取技能,支持普通模式、隐身模式和批量模式。

Description

强大的网页抓取工具,内置反检测技术,绕过常见反爬虫机制。纯手写反检测代码,不依赖任何第三方 stealth 插件。

核心能力:

  • 🕵️ 隐身模式:隐藏 webdriver 指纹、随机 UA、随机延迟、禁用指纹追踪
  • 📄 普通模式:快速抓取页面内容
  • 📦 批量模式:并发抓取多个 URL,自动限速
  • 🎯 精确提取:CSS 选择器定向抓取
  • 📸 截图 & HTML 保存

Configuration

bins: ["node"]

Setup

首次使用前需要安装依赖:

cd ~/.openclaw/workspace/skills/stealth-scraper
npm install

如果 Chromium 未自动安装,运行:

node scripts/setup.js

Usage

普通模式(快速抓取)

node scripts/scraper-simple.js \x3CURL> [options]

参数:

参数 说明 默认值
--wait \x3Cms> 页面加载后等待时间(毫秒) 2000
--selector \x3Ccss> CSS 选择器,只提取匹配元素的内容 无(提取全部)

示例:

# 基本抓取
node scripts/scraper-simple.js https://example.com

# 等待 5 秒后提取
node scripts/scraper-simple.js https://example.com --wait 5000

# 只提取文章正文
node scripts/scraper-simple.js https://example.com --selector "article.content"

隐身模式(反爬虫)

node scripts/scraper-stealth.js \x3CURL> [options]

参数:

参数 说明 默认值
--wait \x3Cms> 页面加载后等待时间(毫秒) 2000
--selector \x3Ccss> CSS 选择器精确提取
--proxy \x3Curl> 代理服务器地址
--screenshot \x3Cpath> 保存截图到指定路径
--html \x3Cpath> 保存完整 HTML 到指定路径
--cookie \x3Cjson> 自定义 cookie(JSON 格式)
--scroll 自动滚动页面加载懒加载内容 false

示例:

# 隐身抓取
node scripts/scraper-stealth.js https://example.com

# 使用代理 + 截图
node scripts/scraper-stealth.js https://example.com --proxy http://127.0.0.1:7890 --screenshot ./shot.png

# 自动滚动 + 保存 HTML
node scripts/scraper-stealth.js https://example.com --scroll --html ./page.html

# 带 cookie 访问
node scripts/scraper-stealth.js https://example.com --cookie '[{"name":"token","value":"abc123","domain":".example.com"}]'

# 精确提取 + 等待
node scripts/scraper-stealth.js https://example.com --selector "div.main" --wait 5000

批量模式

node scripts/scraper-batch.js [options] \x3CURL1> \x3CURL2> ...
# 或
node scripts/scraper-batch.js --file urls.txt [options]

参数:

参数 说明 默认值
--file \x3Cpath> URL 列表文件(每行一个 URL)
--concurrency \x3Cn> 并发数 3
--stealth 使用隐身模式 false
--wait \x3Cms> 每个页面的等待时间 2000
--selector \x3Ccss> CSS 选择器
--output \x3Cpath> 输出 JSON 文件路径 stdout

示例:

# 批量抓取多个 URL
node scripts/scraper-batch.js https://a.com https://b.com https://c.com

# 从文件读取 URL 列表,隐身模式
node scripts/scraper-batch.js --file urls.txt --stealth --concurrency 2

# 输出到文件
node scripts/scraper-batch.js --file urls.txt --output results.json

Output Format

所有模式输出统一的 JSON 结构:

{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "页面文本内容...",
  "links": [{"text": "More info", "href": "https://..."}],
  "images": [{"alt": "Logo", "src": "https://..."}],
  "metadata": {
    "description": "...",
    "keywords": "...",
    "author": "..."
  },
  "elapsedSeconds": 2.35
}

Anti-Detection Features

技术 说明
navigator.webdriver 隐藏 删除 webdriver 属性,伪装为真实浏览器
User-Agent 轮换 10+ 真实 UA,涵盖 Chrome/Safari/Firefox,桌面+移动端
随机延迟 1-3 秒随机等待,模拟人类浏览行为
随机视口 随机分辨率,避免固定窗口指纹
WebGL 指纹防护 注入噪声干扰 WebGL 指纹采集
Canvas 指纹防护 对 Canvas 数据添加微小噪声
插件伪装 伪造 navigator.plugins 数组
语言伪装 伪造 navigator.languages
硬件并发伪装 随机化 navigator.hardwareConcurrency

Notes

  • 纯手写反检测代码,不使用任何第三方 stealth 插件(如 puppeteer-extra-plugin-stealth),避免被反检测系统识别
  • 使用 Playwright 而非 Puppeteer,因为 Playwright 的反检测基础更好
  • 所有反检测代码通过 addInitScript 在页面加载前注入
  • 请遵守目标网站的 robots.txt 和服务条款
Usage Guidance
This package appears to be what it says: a Playwright-based stealth scraper. Before installing or running it, consider the following: (1) install in a controlled/sandbox environment (container or VM) because npm install and npx playwright install chromium will download and install packages and a browser binary to disk; (2) review the remainder of the stealth injection code (the SKILL listing was truncated) to ensure there are no unexpected network callbacks or telemetry; (3) be aware that fingerprint-evasion code is sensitive and may violate target sites' terms of service or local law—use responsibly and ethically; (4) verify package sources (npm registry/mirror) and run 'npm audit' if possible; (5) when running batch jobs, avoid providing sensitive credentials via command-line/cookie arguments and monitor outgoing network traffic (proxy usage) to ensure no data exfiltration. If you need higher assurance, request a full untruncated review of scripts/scraper-stealth.js and monitor the first npm install in an isolated environment.
Capability Analysis
Type: OpenClaw Skill Name: anti-bot-scraper Version: 1.0.0 The skill bundle is a legitimate web scraping toolset using Playwright, featuring simple, stealth, and batch processing modes. The 'stealth' functionality in `scripts/scraper-stealth.js` implements standard anti-fingerprinting techniques (spoofing User-Agents, navigator properties, and adding Canvas/WebGL noise) to bypass bot detection on target websites. The code is well-structured, lacks obfuscation, and contains no evidence of data exfiltration, malicious command execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description (anti-bot Playwright scraper) match the included files and runtime behavior: scripts implement simple, stealth, and batch scraping, with UA/viewports/fingerprint tweaks and optional proxy/cookie input. There are no unrelated credentials, binaries, or config paths requested.
Instruction Scope
SKILL.md instructs the agent/user to run npm install, optionally run scripts/setup.js which itself may run npm install and npx playwright install chromium, and to execute the provided node scripts. The scripts read URL lists, accept proxy/cookie inputs, and write screenshots/HTML/JSON output to disk — all consistent with scraping functionality. One caveat: the provided stealth injection code (truncated in the package listing) modifies many browser-exposed APIs to evade detection; this is expected for the stated purpose but is also sensitive behavior (fingerprint evasion).
Install Mechanism
No platform install spec is present, but package.json and package-lock.json require npm install and the postinstall runs 'npx playwright install chromium' (plus scripts/setup.js also runs npm and npx). This will download packages and browser binaries (Playwright/Chromium) from registries/mirrors. This is expected for a Playwright-based tool but is a higher-risk install action than an instruction-only skill because it writes binaries to disk and runs lifecycle scripts.
Credentials
The skill requests no environment variables or credentials. Command-line options accept proxy URLs and cookie JSON (user-supplied); that is appropriate for a scraper. There are no hidden env accesses in the visible code. No broad credential access or unrelated env vars are requested.
Persistence & Privilege
Skill flags are default (always: false, user-invocable true) and it does not request permanent agent presence or modify other skills' configs. It does install local dependencies and browser binaries into the user's environment when npm install / npx playwright runs, which is normal for Playwright tools but should be run with user consent.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install anti-bot-scraper
  3. After installation, invoke the skill by name or use /anti-bot-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
v1.0.0: 11 anti-detection techniques, batch mode, CSS selector, proxy support. Tested on Xiaohongshu.
Metadata
Slug anti-bot-scraper
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Anti-Bot Scraper?

基于 Playwright 的反爬虫网页抓取技能。支持普通模式、隐身模式和批量模式。内置反检测技术(隐藏 webdriver、随机 UA、Canvas/WebGL 指纹防护)绕过常见反爬虫机制。使用场景:抓取网页内容、反爬虫抓取、批量采集、截图保存。触发关键词:scrape, 抓取, 爬虫, 反爬, stealt... It is an AI Agent Skill for Claude Code / OpenClaw, with 247 downloads so far.

How do I install Anti-Bot Scraper?

Run "/install anti-bot-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Anti-Bot Scraper free?

Yes, Anti-Bot Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Anti-Bot Scraper support?

Anti-Bot Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Anti-Bot Scraper?

It is built and maintained by tttyix (@tttyix); the current version is v1.0.0.

💬 Comments