← 返回 Skills 市场
bryantegomoh

Crawlee Web Scraper

作者 Bryan Tegomoh, MD, MPH · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
187
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install crawlee-web-scraper
功能描述
Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single UR...
使用说明 (SKILL.md)

crawlee-web-scraper

Drop-in replacement for web_fetch when sites block automated requests. Crawlee handles session management, retry logic, and bot-detection evasion automatically.

Scripts

  • crawlee_fetch.py — main scraper; accepts a single URL or a file of URLs; returns JSON
  • crawlee_http.py — library helper; tries requests first, falls back to Crawlee on 403/429/503

Usage

# Single URL, return HTML preview
python3 scripts/crawlee_fetch.py --url "https://example.com"

# Single URL, extract text (strips HTML tags)
python3 scripts/crawlee_fetch.py --url "https://example.com" --extract-text

# Bulk scrape from file
python3 scripts/crawlee_fetch.py --urls-file urls.txt --output results.json

Library usage

from crawlee_http import fetch_with_fallback

resp = fetch_with_fallback("https://example.com")
print(resp.status_code, resp.text[:500])

Output

JSON array with one object per URL:

[
  {
    "url": "https://example.com",
    "status": 200,
    "fetched_at": "2026-01-01T00:00:00Z",
    "length": 12345,
    "text": "Page content..."
  }
]

Installation

pip install crawlee requests

When to use

  • web_fetch returns 403 / 429 / empty
  • Bulk scraping 10+ URLs
  • Sites using Cloudflare or similar bot protection
安全使用建议
This skill appears to be what it says: a Crawlee-based fallback scraper. Before installing, be aware: (1) it requires 'pip install crawlee requests' — Crawlee may install or later download browser tooling (Playwright or similar) which can add network activity and disk artifacts; (2) the scripts will perform HTTP requests to any URL you provide (so don’t give it URLs containing secrets, credentials, or private tokens); (3) scraping sites may violate terms of service or legal rules—use responsibly; (4) the fallback uses a subprocess with a 30s timeout and caps extracted text (10k chars) — adjust if you need longer fetches. If you need stricter controls, run this in an isolated environment and audit installed Python packages (or pin package versions) before use.
功能分析
Type: OpenClaw Skill Name: crawlee-web-scraper Version: 1.0.0 The skill is a legitimate web scraping utility designed to bypass bot detection using the Crawlee library. It consists of a main scraper (crawlee_fetch.py) and a helper (crawlee_http.py) that provides a fallback mechanism from standard requests to Crawlee. The code uses safe subprocess execution (passing arguments as a list) and performs standard file and network operations consistent with its stated purpose without any signs of malicious intent or prompt injection.
能力评估
Purpose & Capability
Name/description (Crawlee-based scraper) matches the delivered artifacts: two Python scripts that use requests and Crawlee to fetch pages and a SKILL.md describing exactly that. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md and the scripts are specific and scoped: they document usage, install (pip install crawlee requests), and show that fetching is targeted at user-supplied URLs. The code only reads a provided URLs file, runs a subprocess to call the included script, and returns JSON. There are no instructions to read unrelated system files, environment variables, or to transmit data to unexpected remote endpoints.
Install Mechanism
No install spec beyond the SKILL.md recommendation 'pip install crawlee requests'. Using pip is expected for a Python library, but installing Crawlee may pull additional runtime deps (Playwright/browser components) which can download browser binaries at install or first-run time. This is typical for headless-browser scrapers but may have additional network/activity implications.
Credentials
The skill declares no required environment variables or credentials and the code does not read secrets or unrelated env vars. All requests are to user-provided target URLs, which is proportionate to a scraping tool.
Persistence & Privilege
Skill does not request always: true and is user-invocable. It does not modify other skills or system-wide agent settings. Autonomous invocation is allowed by default but not combined with other red flags.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install crawlee-web-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /crawlee-web-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of crawlee-web-scraper. - Provides resilient web scraping with evasion for bot detection and rate limits using Crawlee. - Supports both single URLs and bulk file input for scraping. - Implements automatic fallback: tries regular requests, then uses Crawlee on 403/429/503 errors. - Returns standardized JSON output per URL with metadata and extracted content. - Drop-in replacement for web_fetch, with simple command-line and Python library usage.
元数据
Slug crawlee-web-scraper
版本 1.0.0
许可证 MIT-0
累计安装 2
当前安装数 1
历史版本数 1
常见问题

Crawlee Web Scraper 是什么?

Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single UR... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 187 次。

如何安装 Crawlee Web Scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install crawlee-web-scraper」即可一键安装,无需额外配置。

Crawlee Web Scraper 是免费的吗?

是的,Crawlee Web Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Crawlee Web Scraper 支持哪些平台?

Crawlee Web Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Crawlee Web Scraper?

由 Bryan Tegomoh, MD, MPH(@bryantegomoh)开发并维护,当前版本 v1.0.0。

💬 留言讨论