Description

高性能自适应 Python 网页抓取框架，内置反爬虫绕过（Cloudflare Turnstile）、智能元素重定位、完整爬虫框架和 MCP 服务器，适合 AI 辅助数据提取和大规模爬取任务

README (SKILL.md)

Scrapling — 自适应网页抓取框架

Name: Skill
Author: cn-big-cabbage

Scrapling 是 Google Chrome DevTools 生态之外最强大的 Python 网页抓取框架之一，能够处理从单次 HTTP 请求到大规模并发爬取的所有场景。它的自适应解析引擎在网页改版后自动重新定位元素，内置 Cloudflare Turnstile 绕过能力，Spider 框架支持暂停/恢复，并提供 MCP 服务器让 AI 直接辅助数据提取，从源头减少 Token 消耗。

核心使用场景

反爬虫网站抓取：StealthyFetcher 内置 Cloudflare Turnstile 绕过，支持 TLS 指纹伪装和浏览器自动化
自适应数据采集：网页改版后，auto_save=True 保存元素快照，adaptive=True 自动重新定位变化元素
大规模并发爬取：Spider 框架支持多 Session、代理轮换、暂停恢复，像 Scrapy 一样定义爬虫
AI 辅助提取：内置 MCP 服务器，Claude/Cursor 等 AI 工具可直接调用 Scrapling 提取目标内容
动态页面处理：DynamicFetcher 基于 Playwright，支持完整浏览器自动化和网络空闲等待

AI 辅助使用流程

安装依赖 — AI 执行 pip install scrapling 并按需安装浏览器驱动
选择 Fetcher — AI 根据目标网站类型推荐 Fetcher/StealthyFetcher/DynamicFetcher
编写抓取逻辑 — AI 生成 CSS/XPath 选择器代码，配置 auto_save 实现自适应
调试与优化 — AI 分析响应结果，调整选择器或切换 Fetcher 策略
扩展为 Spider — AI 将单页抓取扩展为完整 Spider 类，配置并发和代理
MCP 模式 — 启动 Scrapling MCP Server，让 AI 直接操控浏览器提取数据

关键章节导航

安装指南 — pip 安装、浏览器驱动、Docker 镜像
快速开始 — Fetcher 选型、CSS/XPath 选择器、自适应抓取
高级用法 — Spider 框架、代理轮换、MCP 服务器、CLI 工具
故障排查 — 反爬虫、浏览器驱动、超时、代理问题

AI 助手能力

使用本技能时，AI 可以：

✅ 安装 Scrapling 并配置浏览器驱动（scrapling install playwright / scrapling install camoufox）
✅ 根据目标网站自动选择最合适的 Fetcher 类
✅ 编写 CSS/XPath 选择器提取目标数据
✅ 配置 auto_save=True 和 adaptive=True 实现自适应抓取
✅ 构建完整的 Spider 类实现并发爬取，配置暂停/恢复
✅ 设置代理轮换和防 DNS 泄露（DoH 模式）
✅ 启动和配置 Scrapling MCP 服务器
✅ 使用 CLI 工具快速测试 URL 抓取效果

核心功能

✅ 三种 Fetcher — Fetcher（快速 HTTP）、StealthyFetcher（反爬绕过）、DynamicFetcher（浏览器自动化）
✅ 自适应解析 — 网页改版后自动重定位元素，降低维护成本
✅ Cloudflare 绕过 — 内置 Turnstile/Interstitial 解决方案，免额外服务
✅ Spider 框架 — Scrapy 风格 API，支持并发、多 Session、暂停恢复
✅ 流式输出 — spider.stream() 实时推送抓取结果，适合大规模任务
✅ MCP 服务器 — AI 工具直接调用 Scrapling 提取数据，减少 Token 消耗
✅ 代理轮换 — 内置 ProxyRotator，支持循环或自定义策略
✅ 会话管理 — FetcherSession/StealthySession/DynamicSession 跨请求保持状态
✅ 开发模式 — 首次运行缓存响应，后续离线回放，快速迭代解析逻辑
✅ CLI 工具 — 无需写代码直接从终端抓取页面
✅ IPython Shell — 交互式调试，内置 curl 转换工具
✅ Docker 镜像 — 预置所有浏览器的生产就绪镜像

快速示例

from scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher

# 普通 HTTP 抓取（最快）
page = Fetcher.get('https://quotes.toscrape.com/')
quotes = page.css('.quote .text::text').getall()

# 隐身模式绕过 Cloudflare
page = StealthyFetcher.fetch('https://protected-site.com', headless=True)
data = page.css('.content::text').get()

# 自适应抓取（网站改版后自动重定位）
page = Fetcher.get('https://example.com/products')
products = page.css('.product', auto_save=True)   # 首次保存元素快照
# 网站改版后：
products = page.css('.product', adaptive=True)    # 自动重新定位

# CLI 快速测试（无需写代码）
scrapling fetch https://quotes.toscrape.com/ --css ".quote .text"

# 启动 MCP 服务器
scrapling mcp

安装要求

依赖	版本要求
Python	>= 3.9
pip	任意版本
Playwright	可选（DynamicFetcher 使用）
Camoufox	可选（StealthyFetcher 使用）
Docker	可选（使用官方镜像）

项目链接

GitHub：https://github.com/D4Vinci/Scrapling
文档：https://scrapling.readthedocs.io/en/latest/
PyPI：https://pypi.org/project/scrapling/
MCP 文档：https://scrapling.readthedocs.io/en/latest/ai/mcp-server.html
Discord：https://discord.gg/EMgGbDceNQ

Usage Guidance

This skill appears internally consistent for a web-scraping framework, but before installing or letting an agent run these steps, consider: - Source verification: the SKILL.md references GitHub, PyPI, and Docker images; verify those upstream projects and package authors on PyPI/GitHub to avoid installing trojanized packages. - Pip/docker risk: running pip install or docker pull installs third-party code on your machine — only run them for trusted packages and review package releases. - MCP server exposure: starting scrapling mcp opens a local service that can be called by AI tooling — ensure you understand which clients are allowed to connect and that you don't expose it to untrusted networks. - Secrets and proxies: examples use proxy URLs with credentials and show how to configure MCP in AI clients. Do not provide credentials to the agent or paste them into configs unless you trust the package and know where the credentials will be stored/transmitted. - Legal/ethical: the docs show ways to bypass anti-bot protections (Cloudflare Turnstile) and to disable robots.txt — ensure your scraping activity complies with target site policies and laws. If you want a stronger assessment, provide the actual PyPI package page, the upstream GitHub repo contents (so we can inspect package code), or any release tarball URLs: with those we can scan the package code for dangerous behaviors (exfil endpoints, obfuscated code, unexpected credential access).

Capability Analysis

Type: OpenClaw Skill Name: cn-scrapling Version: 0.1.0 The skill bundle provides documentation and instructions for 'Scrapling', a legitimate Python web scraping framework. The instructions guide the AI agent to install the package via pip, configure browser drivers (Playwright/Camoufox), and perform web scraping tasks. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the capabilities described (e.g., Cloudflare bypass, proxy rotation, and MCP server integration) are standard features for a modern scraping tool and align with the project's stated purpose.

Capability Assessment

✓ Purpose & Capability

The name/description (Scrapling — adaptive web-scraper with Cloudflare/Turnstile handling, adaptive selectors, Spider and MCP server) matches the content of SKILL.md and included guides. Commands and examples (pip install, scrapling fetch, scrapling mcp, Docker image, Playwright/Camoufox, proxies, adaptive=True) are all coherent with a scraping framework.

ℹ Instruction Scope

The SKILL.md instructs the AI to install packages (pip install scrapling), install browser drivers, run the MCP server (scrapling mcp), and check local files (e.g., ~/.scrapling/storage.db). Those filesystem reads and local server configuration are within the framework's purpose, but they do give the agent permission to run installers, read a user home path, and start a local service — items the user should explicitly approve. The docs also include examples that disable robots.txt and enable proxy credentials; these are behavioral choices (ethical/legal) but technically consistent with the stated purpose.

✓ Install Mechanism

This is an instruction-only skill (no install spec). The docs recommend standard install methods (pip install, playwright install, docker pull). Those are common and expected; there are no opaque download URLs or archive/extract steps embedded in the skill artifact itself.

ℹ Credentials

The skill declares no required environment variables or credentials, which matches the metadata. Example usage, however, shows proxy URLs with user:pass and MCP configuration snippets for AI integrations; the skill does not automatically request or store secrets, but the user or agent will need to supply proxy credentials or modify MCP/AI config if they follow those examples. Users should not provide credentials unless they trust the package and understand where they will be stored or transmitted.

ℹ Persistence & Privilege

always is false (normal). The skill documents starting an MCP server (local service) and integrating it into an AI/MCP configuration which opens a local control surface for AI client tools. That is consistent with the feature set but increases runtime exposure (a local HTTP/IPC endpoint that can control browser automation). The skill does not indicate modifying other skills or system-wide agent settings beyond its own MCP entry, which is expected for this use case.

Version History

v0.1.0

Initial release of Scrapling, a high-performance adaptive Python web scraping framework. - Supports automated anti-bot bypass (Cloudflare Turnstile), adaptive element re-location, and complete spider framework. - Includes MCP server for AI-assisted data extraction to reduce token consumption. - Offers three Fetcher classes (Fetcher, StealthyFetcher, DynamicFetcher) for HTTP, stealth/anti-bot, and browser automation tasks. - Features proxy rotation, session management, pause/resume for spiders, and CLI utilities. - Provides comprehensive documentation and integration guides for rapid use and extension.

Metadata

Slug cn-scrapling

Version 0.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Skill?

高性能自适应 Python 网页抓取框架，内置反爬虫绕过（Cloudflare Turnstile）、智能元素重定位、完整爬虫框架和 MCP 服务器，适合 AI 辅助数据提取和大规模爬取任务. It is an AI Agent Skill for Claude Code / OpenClaw, with 92 downloads so far.

How do I install Skill?

Run "/install cn-scrapling" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Skill free?

Yes, Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Skill support?

Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Skill?

It is built and maintained by CN-big-cabbage (@cn-big-cabbage); the current version is v0.1.0.

More Skills

Skill