/install cf-crawl
Cloudflare /crawl
Async site crawler via CF Browser Rendering API. Start a job → poll for results → get markdown/HTML/JSON per page.
Quick Start
# Crawl a site (5 pages, markdown, no JS rendering = fast + free)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 5 --format markdown
# With JS rendering (for SPAs, dynamic content)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --render --limit 10
# Start only (get job ID, poll later)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 100 --start-only
# Poll existing job
bash ~/clawd/skills/cf-crawl/scripts/poll.sh \x3Cjob-id>
Credentials
Stored at ~/.clawdbot/secrets/cloudflare-crawl.env:
CF_ACCOUNT_ID=\x3Caccount_id>
CF_CRAWL_API_TOKEN=\x3Ctoken_with_read_and_edit>
Key Options
| Option | Description |
|---|---|
--limit N |
Max pages (default 10) |
--depth N |
Max link depth (default 10) |
--format markdown|html|json |
Output format (default markdown) |
--render |
Enable headless browser (default: off = fast fetch, free during beta) |
--include PAT |
Wildcard URL pattern to include (repeatable) |
--exclude PAT |
Wildcard URL pattern to exclude (repeatable) |
--external |
Follow external domain links |
--subdomains |
Follow subdomain links |
--source all|sitemaps|links |
URL discovery method |
--json-prompt "..." |
AI extraction prompt (with --format json) |
--json-schema file.json |
JSON schema for structured extraction |
--timeout SEC |
Max poll wait (default 300s) |
--output FILE |
Write full results to file |
--raw |
Output raw API response |
--start-only |
Print job ID without polling |
Common Patterns
Crawl docs site for knowledge base
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://docs.example.com/" \
--limit 50 --depth 3 --format markdown --output docs.json
Crawl with URL filtering
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/" \
--include "/docs/**" --exclude "/docs/archive/**" --limit 20
AI-powered structured extraction
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/products" \
--format json --render \
--json-prompt "Extract product name, price, and description" \
--json-schema schema.json
Long-running crawl (background)
JOB_ID=$(bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://big-site.com" \
--limit 1000 --start-only)
# Check later:
bash ~/clawd/skills/cf-crawl/scripts/poll.sh "$JOB_ID"
Cost Notes
render: false(default) — fast HTML fetch, free during betarender: true— uses Browser Rendering minutes (paid)format json— uses Workers AI tokens for extraction (paid)- Results cached in R2 with
--max-age(default 24hr)
API Details
See references/api-reference.md for full parameter documentation, response schema, and lifecycle details.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install cf-crawl - After installation, invoke the skill by name or use
/cf-crawl - Provide required inputs per the skill's parameter spec and get structured output
What is cf-crawl?
Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and A... It is an AI Agent Skill for Claude Code / OpenClaw, with 237 downloads so far.
How do I install cf-crawl?
Run "/install cf-crawl" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is cf-crawl free?
Yes, cf-crawl is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does cf-crawl support?
cf-crawl is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created cf-crawl?
It is built and maintained by bill492 (@bill492); the current version is v1.0.0.