← Back to Skills Marketplace
bill492

cf-crawl

by bill492 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
237
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install cf-crawl
Description
Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and A...
README (SKILL.md)

Cloudflare /crawl

Async site crawler via CF Browser Rendering API. Start a job → poll for results → get markdown/HTML/JSON per page.

Quick Start

# Crawl a site (5 pages, markdown, no JS rendering = fast + free)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 5 --format markdown

# With JS rendering (for SPAs, dynamic content)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --render --limit 10

# Start only (get job ID, poll later)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 100 --start-only

# Poll existing job
bash ~/clawd/skills/cf-crawl/scripts/poll.sh \x3Cjob-id>

Credentials

Stored at ~/.clawdbot/secrets/cloudflare-crawl.env:

CF_ACCOUNT_ID=\x3Caccount_id>
CF_CRAWL_API_TOKEN=\x3Ctoken_with_read_and_edit>

Key Options

Option Description
--limit N Max pages (default 10)
--depth N Max link depth (default 10)
--format markdown|html|json Output format (default markdown)
--render Enable headless browser (default: off = fast fetch, free during beta)
--include PAT Wildcard URL pattern to include (repeatable)
--exclude PAT Wildcard URL pattern to exclude (repeatable)
--external Follow external domain links
--subdomains Follow subdomain links
--source all|sitemaps|links URL discovery method
--json-prompt "..." AI extraction prompt (with --format json)
--json-schema file.json JSON schema for structured extraction
--timeout SEC Max poll wait (default 300s)
--output FILE Write full results to file
--raw Output raw API response
--start-only Print job ID without polling

Common Patterns

Crawl docs site for knowledge base

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://docs.example.com/" \
  --limit 50 --depth 3 --format markdown --output docs.json

Crawl with URL filtering

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/" \
  --include "/docs/**" --exclude "/docs/archive/**" --limit 20

AI-powered structured extraction

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/products" \
  --format json --render \
  --json-prompt "Extract product name, price, and description" \
  --json-schema schema.json

Long-running crawl (background)

JOB_ID=$(bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://big-site.com" \
  --limit 1000 --start-only)
# Check later:
bash ~/clawd/skills/cf-crawl/scripts/poll.sh "$JOB_ID"

Cost Notes

  • render: false (default) — fast HTML fetch, free during beta
  • render: true — uses Browser Rendering minutes (paid)
  • format json — uses Workers AI tokens for extraction (paid)
  • Results cached in R2 with --max-age (default 24hr)

API Details

See references/api-reference.md for full parameter documentation, response schema, and lifecycle details.

Usage Guidance
This skill's code implements exactly what it claims (start/poll Cloudflare crawl jobs), but the package metadata omitted important operational requirements. Before installing or running: 1) Verify and add required credentials: CF_ACCOUNT_ID and a CF_CRAWL_API_TOKEN with the minimal required scope (prefer least-privilege token). 2) Confirm the documented secrets file path (~/.clawdbot/secrets/cloudflare-crawl.env) is acceptable or change it to a safe location you control. 3) Ensure required binaries (curl, jq, bash) are available or update the manifest to declare them. 4) Review the scripts locally to confirm they do only Cloudflare API calls and do not transmit data elsewhere. 5) Consider running first in a restricted environment (container or VM) and use a test Cloudflare account/token. If the publisher updates the registry metadata to declare the env vars and binaries and documents token scope clearly, this would reduce the concern.
Capability Analysis
Type: OpenClaw Skill Name: cf-crawl Version: 1.0.0 The cf-crawl skill is a legitimate tool for performing web crawls and structured data extraction via the Cloudflare Browser Rendering API. The implementation consists of bash scripts (crawl.sh, poll.sh) that safely interact with the official Cloudflare API endpoint using curl and jq, following standard practices for credential management and input sanitization.
Capability Assessment
Purpose & Capability
The skill's name/description match the included scripts: they start and poll Cloudflare Browser Rendering /crawl jobs and produce markdown/html/json. However the registry metadata declares no required environment variables or binaries, while the scripts require CF_ACCOUNT_ID and CF_CRAWL_API_TOKEN (sourced from ~/.clawdbot/secrets/cloudflare-crawl.env) and rely on curl and jq. The credential and binary requirements are expected for the stated purpose but are missing from the manifest — an incoherence.
Instruction Scope
SKILL.md and the scripts limit actions to starting/polling the Cloudflare API and writing results to stdout or a user-specified file. They source a local secrets file (~/.clawdbot/secrets/cloudflare-crawl.env) for credentials (documented in SKILL.md). There are no instructions to read unrelated system files or send data to third-party endpoints outside Cloudflare. Still, the explicit path to a secrets file is noteworthy and should be confirmed acceptable to the user.
Install Mechanism
This is instruction-only (no install spec) which is low risk, but the included scripts call curl and jq and expect jq to be present. The manifest did not declare required binaries or provide an install step for dependencies. That mismatch increases the chance a user will run the scripts in an unexpected environment or with missing tools.
Credentials
The scripts require CF_ACCOUNT_ID and CF_CRAWL_API_TOKEN with read+edit permissions (the API needs Browser Rendering read+edit). Those credentials are appropriate for controlling Cloudflare crawl jobs, but the manifest did not declare them. Also the token scope (edit) is broader than read-only; recommend least-privilege token scoped only to crawl operations if possible. The skill also documents a specific secrets file path which gives it implicit access to that file — users should verify that path and contents.
Persistence & Privilege
The skill is not always:true, does not request persistent platform-level privileges, and does not modify other skills or global agent settings. It only runs scripts to call Cloudflare APIs.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install cf-crawl
  3. After installation, invoke the skill by name or use /cf-crawl
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of cf-crawl skill. - Enables async, multi-page web crawling using the Cloudflare Browser Rendering /crawl API. - Supports markdown, HTML, and JSON output with flexible link following, pattern filtering, and paging controls. - Includes JS rendering and AI-powered structured data extraction. - Provides credential management, quick start scripts, and cost control options. - Ideal for building knowledge bases or extracting structured data from sites where basic web fetch is insufficient.
Metadata
Slug cf-crawl
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is cf-crawl?

Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and A... It is an AI Agent Skill for Claude Code / OpenClaw, with 237 downloads so far.

How do I install cf-crawl?

Run "/install cf-crawl" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is cf-crawl free?

Yes, cf-crawl is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does cf-crawl support?

cf-crawl is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created cf-crawl?

It is built and maintained by bill492 (@bill492); the current version is v1.0.0.

💬 Comments