功能描述

防止被反爬虫机制识别和封禁。当用户需要进行网页爬取、数据采集、API访问，或询问如何绕过反爬、避免IP封禁、隐藏爬虫身份时使用。触发词包括：反爬、反爬虫、绕过反爬、避免封禁、爬虫伪装、隐身爬取。

使用说明 (SKILL.md)

Anti-Crawler Evasion Skill

Name: Anti-Crawler Evasion
Author: plover061

核心策略概览

1. 请求伪装策略

User-Agent轮换

import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
]

def get_random_ua():
    return random.choice(USER_AGENTS)

请求头完整性

HEADERS = {
    "User-Agent": get_random_ua(),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Cache-Control": "max-age=0"
}

2. IP轮换方案

代理池策略

import requests

PROXY_POOL = [
    {"http": "http://user:[email protected]:8080"},
    {"http": "http://user:[email protected]:8080"},
    {"http": "http://user:[email protected]:8080"}
]

def get_random_proxy():
    return random.choice(PROXY_POOL)

def fetch_with_proxy(url):
    proxy = get_random_proxy()
    response = requests.get(url, proxies=proxy, headers=HEADERS)
    return response

代理类型选择

代理类型	匿名度	适用场景	成本
住宅代理	高	高级反爬绕过	高
数据中心代理	中	常规爬取	中
旋转代理	高	大规模采集	高
免费代理	低	测试/演示	无

3. 访问频率控制

import time
import random

class RateLimiter:
    def __init__(self, min_delay=3, max_delay=10):
        self.min_delay = min_delay
        self.max_delay = max_delay
        self.last_request = 0

    def wait(self):
        delay = random.uniform(self.min_delay, self.max_delay)
        time.sleep(delay)

    def adaptive_wait(self, response):
        """根据响应状态自适应调整延迟"""
        if response.status_code == 429:
            time.sleep(60)  # 遇到限流，等待更长时间
        elif response.status_code == 200:
            # 成功请求后稍微增加延迟，降低被封风险
            delay = random.uniform(self.min_delay, self.max_delay) * 1.5
            time.sleep(delay)

4. 浏览器指纹规避

指纹随机化

from selenium import webdriver
from selenium_stealth import stealth

def create_stealth_driver():
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-blink-features=AutomationControlled')

    driver = webdriver.Chrome(options=options)

    stealth(driver,
        languages=["en-US", "en", "zh-CN", "zh"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
    )

    return driver

5. Cookie和会话管理

import requests
from http.cookiejar import CookieJar

session = requests.Session()

# 保持Cookie持久化
session.cookies = CookieJar()

# 从真实浏览器导入Cookie
def import_browser_cookies(session, browser="chrome"):
    """从浏览器导入Cookie以通过初验"""
    # 实现细节根据目标浏览器而定
    pass

6. JavaScript挑战绕过

使用Playwright/Selenium处理JS渲染

from playwright.sync_api import sync_playwright

def scrape_dynamic_page(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            user_agent=get_random_ua(),
            viewport={'width': 1920, 'height': 1080}
        )
        page = context.new_page()

        # 模拟人类行为
        page.goto(url)
        page.mouse.wheel(0, 500)  # 模拟滚动
        page.wait_for_timeout(2000)  # 随机等待

        content = page.content()
        browser.close()
        return content

7. 验证码处理策略

第三方验证码服务

# 2Captcha API集成
import requests

def solve_captcha(site_key, page_url):
    """使用2Captcha解决验证码"""
    api_key = "YOUR_API_KEY"

    # 提交验证码
    submit_url = f"http://2captcha.com/in.php?key={api_key}&method=userrecaptcha&googlekey={site_key}&pageurl={page_url}"
    resp = requests.get(submit_url)
    captcha_id = resp.text.split('|')[1]

    # 等待结果
    for _ in range(30):
        time.sleep(5)
        result_url = f"http://2captcha.com/res.php?key={api_key}&action=get&id={captcha_id}"
        result = requests.get(result_url)
        if result.text.startswith('OK'):
            return result.text.split('|')[1]

    return None

高级规避技术

分布式爬取架构

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Master    │────▶│   Workers   │────▶│  Proxy Pool │
│   Server    │     │   (多个)     │     │  (轮换IP)   │
└─────────────┘     └─────────────┘     └─────────────┘
       │                   │                   │
       └───────────────────┴───────────────────┘
                           │
                    ┌──────▼──────┐
                    │  Task Queue │
                    │  (Redis)    │
                    └─────────────┘

行为模拟

from mouse import move, click
from keyboard import write, press
import random
import time

def human_behavior_simulation(driver, element):
    """模拟人类行为操作元素"""
    rect = element.rect

    # 随机偏移模拟鼠标移动
    x = rect['x'] + random.randint(5, 20)
    y = rect['y'] + random.randint(5, 20)

    # 鼠标移动（不直线路径）
    move_in_human_pattern(x, y)

    # 随机延迟
    time.sleep(random.uniform(0.1, 0.3))

    # 点击
    click(x, y)

def move_in_human_pattern(target_x, target_y):
    """模拟人类鼠标移动路径"""
    # 添加随机中间点
    current_x, current_y = get_current_mouse_position()
    points = generate_human_path(current_x, current_y, target_x, target_y)

    for x, y in points:
        move(x, y)
        time.sleep(random.uniform(0.01, 0.03))

检测规避清单

检测类型	规避方法	优先级
IP频率检测	代理轮换 + 延迟	高
User-Agent检测	UA轮换池	高
Cookie/Session检测	真实浏览器Cookie	中
行为模式检测	人性化操作模拟	高
浏览器指纹检测	Stealth模式	中
JavaScript检测	使用真实浏览器	高
验证码	第三方识别服务	按需

最佳实践

渐进式部署：从低频率开始，逐步调整策略
监控响应：密切关注HTTP状态码和响应时间
备用方案：准备多个数据源，避免单点依赖
遵守规则：优先遵守robots.txt和网站条款
日志记录：详细记录请求和响应，便于问题排查

常见反爬绕过场景

场景1: IP封禁

症状: HTTP 403/429
解决: 接入代理池，降低请求频率

场景2: 验证码拦截

症状: 页面出现验证码
解决: 验证码识别服务或手动处理

场景3: JavaScript渲染

症状: 页面内容为空或加密
解决: 使用Playwright/Selenium

场景4: 行为分析拦截

症状: 无明显错误但数据异常
解决: 添加人性化行为模拟

工具推荐

工具	用途	场景
Playwright	浏览器自动化	JS渲染页面
Selenium + Stealth	浏览器伪装	需要登录的页面
ScraperAPI	云端代理服务	快速集成
Crawlera	智能代理池	企业级应用
2Captcha	验证码识别	验证码拦截

安全使用建议

This skill provides step-by-step techniques to evade anti-scraping defenses (proxies, browser fingerprinting, cookie import, CAPTCHA solving). Before installing or running it, consider: (1) legal and ethical risk—bypassing site protections can violate terms of service and laws; (2) do not supply browser cookies, proxy credentials, or API keys to an untrusted skill—those can expose your sessions and accounts; (3) the skill omits required binaries and env vars, so you would need to install packages and drivers yourself—avoid copy/pasting unknown install commands; (4) prefer using sanctioned APIs or obtaining permission from target sites; (5) if you must test, run in an isolated sandbox with no real credentials and disable autonomous invocation so the agent cannot act without your explicit approval. Ask the publisher for a clear list of required packages, exact env vars, and a justification for any operation that reads local cookies or system devices before proceeding.

功能分析

Type: OpenClaw Skill Name: anti-crawler-evasion Version: 1.0.0 The skill bundle is designed to bypass anti-bot and anti-crawler security measures using techniques such as IP rotation, browser fingerprinting evasion (via selenium-stealth), and human behavior simulation. While these capabilities are aligned with the stated purpose of 'anti-crawler evasion' in SKILL.md, the inclusion of code to simulate hardware-level mouse and keyboard movements (using the 'mouse' and 'keyboard' libraries) and the explicit focus on bypassing security controls constitute high-risk behaviors. There is no evidence of intentional malice such as data exfiltration or backdoors, but the automation of security bypasses warrants a suspicious classification.

能力评估

⚠ Purpose & Capability

The name and description match the content (anti-crawler evasion). However, the SKILL.md contains code that requires browser automation (Selenium/Playwright), proxy credentials, and third‑party CAPTCHA services—yet the skill declares no required binaries, env vars, or config paths. Legitimately using these techniques would normally require drivers, installed packages, and API/proxy credentials; the absence of those declarations is incoherent.

⚠ Instruction Scope

The runtime instructions go beyond simple guidance: they include code to import browser cookies, create headless browsers and stealth drivers, simulate mouse/keyboard events, rotate proxies with credentials, and call external CAPTCHA solving APIs. Importing browser cookies or running local input-simulation libraries implies access to sensitive local data and devices. The skill does not limit or document how such local access should be obtained or consented to.

ℹ Install Mechanism

The skill is instruction-only (no install spec), which lowers direct install risk. However, the code snippets implicitly require installing Python packages (requests, selenium, playwright, selenium_stealth, mouse/keyboard libraries), browser drivers, and possibly system-level dependencies. The absence of an explicit install spec or declared binaries is a mismatch that could lead users to run ad-hoc installs or copy-paste unsafe commands.

⚠ Credentials

The skill declares no required environment variables or credentials, but examples use proxy credentials (user:pass@...), and a placeholder API key for 2Captcha. It also recommends importing browser cookies (session tokens). Requesting or using those secrets without declaring them is disproportionate and increases risk of credential exposure or misuse.

ℹ Persistence & Privilege

always:false (good) and autonomous invocation is allowed (platform default). Autonomous invocation combined with instructions that access local cookies, proxies, or external CAPTCHA services increases the blast radius if the agent is allowed to act without human review—consider restricting autonomous invocation or adding explicit prompts/consent before performing sensitive actions.

版本历史

v1.0.0

Initial release of anti-crawler-evasion skill. - Provides practical strategies for bypassing anti-crawling detection and bans, including user-agent rotation, proxy/IP rotation, and request header completeness. - Includes rate limiting techniques and adaptive delays to control request frequency and avoid bans. - Offers browser fingerprint randomization, cookie/session management, and JavaScript rendering challenge handling (with Selenium/Playwright). - Covers automated CAPTCHA solving via 2Captcha integration. - Details distributed crawling architecture and human-like behavior simulation for advanced evasion. - Lists common anti-crawler scenarios and countermeasures, and provides recommended tools for implementation.

元数据

Slug anti-crawler-evasion

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Anti-Crawler Evasion 是什么？

防止被反爬虫机制识别和封禁。当用户需要进行网页爬取、数据采集、API访问，或询问如何绕过反爬、避免IP封禁、隐藏爬虫身份时使用。触发词包括：反爬、反爬虫、绕过反爬、避免封禁、爬虫伪装、隐身爬取。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 175 次。

如何安装 Anti-Crawler Evasion？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install anti-crawler-evasion」即可一键安装，无需额外配置。

Anti-Crawler Evasion 是免费的吗？

是的，Anti-Crawler Evasion 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Anti-Crawler Evasion 支持哪些平台？

Anti-Crawler Evasion 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Anti-Crawler Evasion？

由 plover061（@plover061）开发并维护，当前版本 v1.0.0。

Anti-Crawler Evasion