← Back to Skills Marketplace

cf-crawl

Name: cf-crawl
Author: bill492

by bill492 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

237

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install cf-crawl

Description

Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and A...

README (SKILL.md)

Cloudflare /crawl

Async site crawler via CF Browser Rendering API. Start a job → poll for results → get markdown/HTML/JSON per page.

Quick Start

# Crawl a site (5 pages, markdown, no JS rendering = fast + free)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 5 --format markdown

# With JS rendering (for SPAs, dynamic content)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --render --limit 10

# Start only (get job ID, poll later)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 100 --start-only

# Poll existing job
bash ~/clawd/skills/cf-crawl/scripts/poll.sh \x3Cjob-id>

Credentials

Stored at ~/.clawdbot/secrets/cloudflare-crawl.env:

CF_ACCOUNT_ID=\x3Caccount_id>
CF_CRAWL_API_TOKEN=\x3Ctoken_with_read_and_edit>

Key Options

Option	Description
`--limit N`	Max pages (default 10)
`--depth N`	Max link depth (default 10)
`--format markdown\|html\|json`	Output format (default markdown)
`--render`	Enable headless browser (default: off = fast fetch, free during beta)
`--include PAT`	Wildcard URL pattern to include (repeatable)
`--exclude PAT`	Wildcard URL pattern to exclude (repeatable)
`--external`	Follow external domain links
`--subdomains`	Follow subdomain links
`--source all\|sitemaps\|links`	URL discovery method
`--json-prompt "..."`	AI extraction prompt (with `--format json`)
`--json-schema file.json`	JSON schema for structured extraction
`--timeout SEC`	Max poll wait (default 300s)
`--output FILE`	Write full results to file
`--raw`	Output raw API response
`--start-only`	Print job ID without polling

Common Patterns

Crawl docs site for knowledge base

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://docs.example.com/" \
  --limit 50 --depth 3 --format markdown --output docs.json

Crawl with URL filtering

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/" \
  --include "/docs/**" --exclude "/docs/archive/**" --limit 20

AI-powered structured extraction

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/products" \
  --format json --render \
  --json-prompt "Extract product name, price, and description" \
  --json-schema schema.json

Long-running crawl (background)

JOB_ID=$(bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://big-site.com" \
  --limit 1000 --start-only)
# Check later:
bash ~/clawd/skills/cf-crawl/scripts/poll.sh "$JOB_ID"

Cost Notes

render: false (default) — fast HTML fetch, free during beta
render: true — uses Browser Rendering minutes (paid)
format json — uses Workers AI tokens for extraction (paid)
Results cached in R2 with --max-age (default 24hr)

API Details

See references/api-reference.md for full parameter documentation, response schema, and lifecycle details.

Usage Guidance

This skill's code implements exactly what it claims (start/poll Cloudflare crawl jobs), but the package metadata omitted important operational requirements. Before installing or running: 1) Verify and add required credentials: CF_ACCOUNT_ID and a CF_CRAWL_API_TOKEN with the minimal required scope (prefer least-privilege token). 2) Confirm the documented secrets file path (~/.clawdbot/secrets/cloudflare-crawl.env) is acceptable or change it to a safe location you control. 3) Ensure required binaries (curl, jq, bash) are available or update the manifest to declare them. 4) Review the scripts locally to confirm they do only Cloudflare API calls and do not transmit data elsewhere. 5) Consider running first in a restricted environment (container or VM) and use a test Cloudflare account/token. If the publisher updates the registry metadata to declare the env vars and binaries and documents token scope clearly, this would reduce the concern.

Capability Analysis

Type: OpenClaw Skill Name: cf-crawl Version: 1.0.0 The cf-crawl skill is a legitimate tool for performing web crawls and structured data extraction via the Cloudflare Browser Rendering API. The implementation consists of bash scripts (crawl.sh, poll.sh) that safely interact with the official Cloudflare API endpoint using curl and jq, following standard practices for credential management and input sanitization.

Capability Assessment

⚠ Purpose & Capability

The skill's name/description match the included scripts: they start and poll Cloudflare Browser Rendering /crawl jobs and produce markdown/html/json. However the registry metadata declares no required environment variables or binaries, while the scripts require CF_ACCOUNT_ID and CF_CRAWL_API_TOKEN (sourced from ~/.clawdbot/secrets/cloudflare-crawl.env) and rely on curl and jq. The credential and binary requirements are expected for the stated purpose but are missing from the manifest — an incoherence.

ℹ Instruction Scope

SKILL.md and the scripts limit actions to starting/polling the Cloudflare API and writing results to stdout or a user-specified file. They source a local secrets file (~/.clawdbot/secrets/cloudflare-crawl.env) for credentials (documented in SKILL.md). There are no instructions to read unrelated system files or send data to third-party endpoints outside Cloudflare. Still, the explicit path to a secrets file is noteworthy and should be confirmed acceptable to the user.

⚠ Install Mechanism

This is instruction-only (no install spec) which is low risk, but the included scripts call curl and jq and expect jq to be present. The manifest did not declare required binaries or provide an install step for dependencies. That mismatch increases the chance a user will run the scripts in an unexpected environment or with missing tools.

⚠ Credentials

The scripts require CF_ACCOUNT_ID and CF_CRAWL_API_TOKEN with read+edit permissions (the API needs Browser Rendering read+edit). Those credentials are appropriate for controlling Cloudflare crawl jobs, but the manifest did not declare them. Also the token scope (edit) is broader than read-only; recommend least-privilege token scoped only to crawl operations if possible. The skill also documents a specific secrets file path which gives it implicit access to that file — users should verify that path and contents.

✓ Persistence & Privilege

The skill is not always:true, does not request persistent platform-level privileges, and does not modify other skills or global agent settings. It only runs scripts to call Cloudflare APIs.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install cf-crawl
After installation, invoke the skill by name or use /cf-crawl
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of cf-crawl skill. - Enables async, multi-page web crawling using the Cloudflare Browser Rendering /crawl API. - Supports markdown, HTML, and JSON output with flexible link following, pattern filtering, and paging controls. - Includes JS rendering and AI-powered structured data extraction. - Provides credential management, quick start scripts, and cost control options. - Ideal for building knowledge bases or extracting structured data from sites where basic web fetch is insufficient.

Metadata

Slug cf-crawl

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is cf-crawl?

Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and A... It is an AI Agent Skill for Claude Code / OpenClaw, with 237 downloads so far.

How do I install cf-crawl?

Run "/install cf-crawl" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is cf-crawl free?

Yes, cf-crawl is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does cf-crawl support?

cf-crawl is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created cf-crawl?

It is built and maintained by bill492 (@bill492); the current version is v1.0.0.

More Skills