← Back to Skills Marketplace

Crawlee Web Scraper

Name: Crawlee Web Scraper
Author: bryantegomoh

by Bryan Tegomoh, MD, MPH · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

187

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install crawlee-web-scraper

Description

Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single UR...

README (SKILL.md)

crawlee-web-scraper

Drop-in replacement for web_fetch when sites block automated requests. Crawlee handles session management, retry logic, and bot-detection evasion automatically.

Scripts

crawlee_fetch.py — main scraper; accepts a single URL or a file of URLs; returns JSON
crawlee_http.py — library helper; tries requests first, falls back to Crawlee on 403/429/503

Usage

# Single URL, return HTML preview
python3 scripts/crawlee_fetch.py --url "https://example.com"

# Single URL, extract text (strips HTML tags)
python3 scripts/crawlee_fetch.py --url "https://example.com" --extract-text

# Bulk scrape from file
python3 scripts/crawlee_fetch.py --urls-file urls.txt --output results.json

Library usage

from crawlee_http import fetch_with_fallback

resp = fetch_with_fallback("https://example.com")
print(resp.status_code, resp.text[:500])

Output

JSON array with one object per URL:

[
  {
    "url": "https://example.com",
    "status": 200,
    "fetched_at": "2026-01-01T00:00:00Z",
    "length": 12345,
    "text": "Page content..."
  }
]

Installation

pip install crawlee requests

When to use

web_fetch returns 403 / 429 / empty
Bulk scraping 10+ URLs
Sites using Cloudflare or similar bot protection

Usage Guidance

This skill appears to be what it says: a Crawlee-based fallback scraper. Before installing, be aware: (1) it requires 'pip install crawlee requests' — Crawlee may install or later download browser tooling (Playwright or similar) which can add network activity and disk artifacts; (2) the scripts will perform HTTP requests to any URL you provide (so don’t give it URLs containing secrets, credentials, or private tokens); (3) scraping sites may violate terms of service or legal rules—use responsibly; (4) the fallback uses a subprocess with a 30s timeout and caps extracted text (10k chars) — adjust if you need longer fetches. If you need stricter controls, run this in an isolated environment and audit installed Python packages (or pin package versions) before use.

Capability Analysis

Type: OpenClaw Skill Name: crawlee-web-scraper Version: 1.0.0 The skill is a legitimate web scraping utility designed to bypass bot detection using the Crawlee library. It consists of a main scraper (crawlee_fetch.py) and a helper (crawlee_http.py) that provides a fallback mechanism from standard requests to Crawlee. The code uses safe subprocess execution (passing arguments as a list) and performs standard file and network operations consistent with its stated purpose without any signs of malicious intent or prompt injection.

Capability Assessment

✓ Purpose & Capability

Name/description (Crawlee-based scraper) matches the delivered artifacts: two Python scripts that use requests and Crawlee to fetch pages and a SKILL.md describing exactly that. No unrelated credentials, binaries, or config paths are requested.

✓ Instruction Scope

SKILL.md and the scripts are specific and scoped: they document usage, install (pip install crawlee requests), and show that fetching is targeted at user-supplied URLs. The code only reads a provided URLs file, runs a subprocess to call the included script, and returns JSON. There are no instructions to read unrelated system files, environment variables, or to transmit data to unexpected remote endpoints.

ℹ Install Mechanism

No install spec beyond the SKILL.md recommendation 'pip install crawlee requests'. Using pip is expected for a Python library, but installing Crawlee may pull additional runtime deps (Playwright/browser components) which can download browser binaries at install or first-run time. This is typical for headless-browser scrapers but may have additional network/activity implications.

✓ Credentials

The skill declares no required environment variables or credentials and the code does not read secrets or unrelated env vars. All requests are to user-provided target URLs, which is proportionate to a scraping tool.

✓ Persistence & Privilege

Skill does not request always: true and is user-invocable. It does not modify other skills or system-wide agent settings. Autonomous invocation is allowed by default but not combined with other red flags.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install crawlee-web-scraper
After installation, invoke the skill by name or use /crawlee-web-scraper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of crawlee-web-scraper. - Provides resilient web scraping with evasion for bot detection and rate limits using Crawlee. - Supports both single URLs and bulk file input for scraping. - Implements automatic fallback: tries regular requests, then uses Crawlee on 403/429/503 errors. - Returns standardized JSON output per URL with metadata and extracted content. - Drop-in replacement for web_fetch, with simple command-line and Python library usage.

Metadata

Slug crawlee-web-scraper

Version 1.0.0

License MIT-0

All-time Installs 2

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Crawlee Web Scraper?

Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single UR... It is an AI Agent Skill for Claude Code / OpenClaw, with 187 downloads so far.

How do I install Crawlee Web Scraper?

Run "/install crawlee-web-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Crawlee Web Scraper free?

Yes, Crawlee Web Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Crawlee Web Scraper support?

Crawlee Web Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Crawlee Web Scraper?

It is built and maintained by Bryan Tegomoh, MD, MPH (@bryantegomoh); the current version is v1.0.0.

More Skills