← 返回 Skills 市场
bill492

cf-crawl

作者 bill492 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
237
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install cf-crawl
功能描述
Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and A...
使用说明 (SKILL.md)

Cloudflare /crawl

Async site crawler via CF Browser Rendering API. Start a job → poll for results → get markdown/HTML/JSON per page.

Quick Start

# Crawl a site (5 pages, markdown, no JS rendering = fast + free)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 5 --format markdown

# With JS rendering (for SPAs, dynamic content)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --render --limit 10

# Start only (get job ID, poll later)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 100 --start-only

# Poll existing job
bash ~/clawd/skills/cf-crawl/scripts/poll.sh \x3Cjob-id>

Credentials

Stored at ~/.clawdbot/secrets/cloudflare-crawl.env:

CF_ACCOUNT_ID=\x3Caccount_id>
CF_CRAWL_API_TOKEN=\x3Ctoken_with_read_and_edit>

Key Options

Option Description
--limit N Max pages (default 10)
--depth N Max link depth (default 10)
--format markdown|html|json Output format (default markdown)
--render Enable headless browser (default: off = fast fetch, free during beta)
--include PAT Wildcard URL pattern to include (repeatable)
--exclude PAT Wildcard URL pattern to exclude (repeatable)
--external Follow external domain links
--subdomains Follow subdomain links
--source all|sitemaps|links URL discovery method
--json-prompt "..." AI extraction prompt (with --format json)
--json-schema file.json JSON schema for structured extraction
--timeout SEC Max poll wait (default 300s)
--output FILE Write full results to file
--raw Output raw API response
--start-only Print job ID without polling

Common Patterns

Crawl docs site for knowledge base

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://docs.example.com/" \
  --limit 50 --depth 3 --format markdown --output docs.json

Crawl with URL filtering

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/" \
  --include "/docs/**" --exclude "/docs/archive/**" --limit 20

AI-powered structured extraction

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/products" \
  --format json --render \
  --json-prompt "Extract product name, price, and description" \
  --json-schema schema.json

Long-running crawl (background)

JOB_ID=$(bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://big-site.com" \
  --limit 1000 --start-only)
# Check later:
bash ~/clawd/skills/cf-crawl/scripts/poll.sh "$JOB_ID"

Cost Notes

  • render: false (default) — fast HTML fetch, free during beta
  • render: true — uses Browser Rendering minutes (paid)
  • format json — uses Workers AI tokens for extraction (paid)
  • Results cached in R2 with --max-age (default 24hr)

API Details

See references/api-reference.md for full parameter documentation, response schema, and lifecycle details.

安全使用建议
This skill's code implements exactly what it claims (start/poll Cloudflare crawl jobs), but the package metadata omitted important operational requirements. Before installing or running: 1) Verify and add required credentials: CF_ACCOUNT_ID and a CF_CRAWL_API_TOKEN with the minimal required scope (prefer least-privilege token). 2) Confirm the documented secrets file path (~/.clawdbot/secrets/cloudflare-crawl.env) is acceptable or change it to a safe location you control. 3) Ensure required binaries (curl, jq, bash) are available or update the manifest to declare them. 4) Review the scripts locally to confirm they do only Cloudflare API calls and do not transmit data elsewhere. 5) Consider running first in a restricted environment (container or VM) and use a test Cloudflare account/token. If the publisher updates the registry metadata to declare the env vars and binaries and documents token scope clearly, this would reduce the concern.
功能分析
Type: OpenClaw Skill Name: cf-crawl Version: 1.0.0 The cf-crawl skill is a legitimate tool for performing web crawls and structured data extraction via the Cloudflare Browser Rendering API. The implementation consists of bash scripts (crawl.sh, poll.sh) that safely interact with the official Cloudflare API endpoint using curl and jq, following standard practices for credential management and input sanitization.
能力评估
Purpose & Capability
The skill's name/description match the included scripts: they start and poll Cloudflare Browser Rendering /crawl jobs and produce markdown/html/json. However the registry metadata declares no required environment variables or binaries, while the scripts require CF_ACCOUNT_ID and CF_CRAWL_API_TOKEN (sourced from ~/.clawdbot/secrets/cloudflare-crawl.env) and rely on curl and jq. The credential and binary requirements are expected for the stated purpose but are missing from the manifest — an incoherence.
Instruction Scope
SKILL.md and the scripts limit actions to starting/polling the Cloudflare API and writing results to stdout or a user-specified file. They source a local secrets file (~/.clawdbot/secrets/cloudflare-crawl.env) for credentials (documented in SKILL.md). There are no instructions to read unrelated system files or send data to third-party endpoints outside Cloudflare. Still, the explicit path to a secrets file is noteworthy and should be confirmed acceptable to the user.
Install Mechanism
This is instruction-only (no install spec) which is low risk, but the included scripts call curl and jq and expect jq to be present. The manifest did not declare required binaries or provide an install step for dependencies. That mismatch increases the chance a user will run the scripts in an unexpected environment or with missing tools.
Credentials
The scripts require CF_ACCOUNT_ID and CF_CRAWL_API_TOKEN with read+edit permissions (the API needs Browser Rendering read+edit). Those credentials are appropriate for controlling Cloudflare crawl jobs, but the manifest did not declare them. Also the token scope (edit) is broader than read-only; recommend least-privilege token scoped only to crawl operations if possible. The skill also documents a specific secrets file path which gives it implicit access to that file — users should verify that path and contents.
Persistence & Privilege
The skill is not always:true, does not request persistent platform-level privileges, and does not modify other skills or global agent settings. It only runs scripts to call Cloudflare APIs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install cf-crawl
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /cf-crawl 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of cf-crawl skill. - Enables async, multi-page web crawling using the Cloudflare Browser Rendering /crawl API. - Supports markdown, HTML, and JSON output with flexible link following, pattern filtering, and paging controls. - Includes JS rendering and AI-powered structured data extraction. - Provides credential management, quick start scripts, and cost control options. - Ideal for building knowledge bases or extracting structured data from sites where basic web fetch is insufficient.
元数据
Slug cf-crawl
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

cf-crawl 是什么?

Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and A... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 237 次。

如何安装 cf-crawl?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install cf-crawl」即可一键安装,无需额外配置。

cf-crawl 是免费的吗?

是的,cf-crawl 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

cf-crawl 支持哪些平台?

cf-crawl 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 cf-crawl?

由 bill492(@bill492)开发并维护,当前版本 v1.0.0。

💬 留言讨论