← 返回 Skills 市场
simonpierreboucher02

Firecrawl

作者 Simon-Pierrre Boucher · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
46
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install firecrawl-ws
功能描述
AI-native web scraping, crawling, domain mapping, and structured extraction. Use for converting websites into LLM-ready Markdown, scraping pages with dynamic...
使用说明 (SKILL.md)

Firecrawl Skill

This skill extends Manus with the capability to search, scrape, crawl, and extract structured data from any website using Firecrawl [1] [2].

  • Author: Simon-Pierre Boucher
  • Target Audience: AI Engineers, Agent Developers, Data Engineers, Web Scraping Engineers

1. Core Workflows

1.1 Scraping a Single URL (/scrape)

Use when you need the text content, Markdown, HTML, or screenshots of a specific webpage [2].

  1. Initialize the Firecrawl client with your API key [1].
  2. Specify output formats (e.g., ["markdown"] or ["markdown", "screenshot"]) [2].
  3. Apply custom browser actions (e.g., click, wait, write) if the page has dynamic content or requires interaction [2].
  4. Optionally filter the DOM using includeTags or excludeTags [2].

1.2 Crawling a Domain (/crawl)

Use when you need to discover and scrape all pages under a specific domain or path recursively [1] [2].

  1. Start an asynchronous crawl job by specifying the starting url [2].
  2. Set depth limits (maxDepth) and page limits (limit) to control token and credit usage [2].
  3. Configure scrapeOptions to ensure each crawled page is parsed with the correct format (e.g., Markdown only) [2].
  4. Poll the crawl status using the jobId until completed [2].

1.3 Mapping a Domain (/map)

Use when you need to quickly discover all URLs belonging to a domain without scraping page content [1] [2].

  1. Provide the base url [2].
  2. Optionally provide a search filter to only return URLs matching a specific keyword or path [2].
  3. Set includeSubdomains to true if you need sub-domain discovery [2].

1.4 Structured Extraction (/extract)

Use when you need to parse raw web pages and extract structured JSON data conforming to a specific schema [3].

  1. Provide an array of urls and a natural language extraction prompt [3].
  2. Define the target schema using a JSON Schema, Pydantic model (Python), or Zod schema (TypeScript) [3].
  3. Run the extraction to retrieve guaranteed, type-safe JSON [3].

2. Resource Guides

For comprehensive API parameters, SDK code templates, and configuration options, read the following reference files:

  • API Reference & SDK Snippets: Read references/api_reference.md for complete endpoint request/response schemas, Python SDK templates, and TypeScript/Zod snippets.
  • Self-Hosting & Docker: Read references/self_hosting.md for production-ready Docker Compose configurations, environment variables, and scaling guidelines.

3. Best Practices & Anti-Patterns

3.1 Best Practices

  • Always use onlyMainContent: true to strip out navigation bars, headers, and footers. This dramatically reduces downstream LLM token costs and keeps context windows clean [2].
  • Leverage /map before /crawl if you only need to discover pages or filter specific URLs to scrape. Mapping is significantly faster and cheaper than full crawls [1] [2].
  • Implement exponential backoff with jitter when handling rate limits (429) or transient server errors (5xx) to ensure scraping resiliency [4].
  • Set explicit CPU and RAM limits on your containers if self-hosting to prevent headless Chromium from consuming all host system resources [5].

3.2 Anti-Patterns

  • Do not use hard-coded waitFor delays when scraping dynamic content. Instead, use selector-based waits (e.g., {"type": "wait", "selector": "#loaded-element"}) to minimize request latency [2].
  • Do not run synchronous crawls. Crawling is an inherently long-running process; always use the asynchronous /crawl endpoint and poll for results or use webhooks [2].
  • Do not reuse browser sessions across unrelated scraping tasks if security isolation is required. Firecrawl relies on ephemeral containers to prevent session contamination [5].

References

[1] Firecrawl Homepage, "The API to search, scrape, and interact with the web at scale." URL: https://github.com/firecrawl/firecrawl
[2] Firecrawl Documentation, "Advanced Scraping Guide." URL: https://docs.firecrawl.dev/advanced-scraping-guide
[3] Firecrawl Documentation, "Agent Endpoint." URL: https://docs.firecrawl.dev/features/agent
[4] Firecrawl Documentation, "Rate Limits." URL: https://docs.firecrawl.dev/rate-limits
[5] Firecrawl GitHub Repository, "Self-hosting Firecrawl Guide." URL: https://raw.githubusercontent.com/firecrawl/firecrawl/main/SELF_HOST.md

安全使用建议
Before installing, understand that Firecrawl requests may send target URLs, prompts, schemas, and scraped page content to Firecrawl unless you self-host. Do not submit secrets, authenticated pages, or internal-only URLs without authorization. If self-hosting, do not expose the provided Docker example to the internet as-is; bind to localhost or a private network and enable authentication or a protected reverse proxy.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The skill clearly describes Firecrawl workflows for scraping, crawling, URL mapping, and structured extraction, and its API-key requirement matches that purpose.
Instruction Scope
Instructions are scoped to user-directed Firecrawl API and SDK usage; the references show external Firecrawl endpoints, but they do not include a prominent privacy warning about sending URLs and retrieved page content to a third-party service.
Install Mechanism
The artifact contains only Markdown files and no executable install scripts, dependencies, or automatic runtime hooks.
Credentials
The self-hosting reference includes a minimal Docker Compose example with HOST=0.0.0.0 and USE_DB_AUTHENTICATION=false, which is risky if copied onto an exposed host, but it is documentation rather than automatic installation behavior.
Persistence & Privilege
The skill itself has no persistence or privilege escalation; the self-hosting example uses restart: always and a Redis volume as expected for a service deployment, with user action required.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install firecrawl-ws
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /firecrawl-ws 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Firecrawl Skill 1.0.0 – Initial Release - Adds AI-native web scraping, crawling, domain mapping, and structured extraction capabilities via Firecrawl. - Supports scraping single pages, crawling entire domains, mapping URLs, and extracting structured JSON according to user-defined schemas. - Includes best-practice guidance for efficient, scalable, and resilient data extraction. - Provides resource guides for API integration, SDK templates, and self-hosted deployments. - Details anti-patterns and optimization strategies for cost-effective LLM-ready data processing.
元数据
Slug firecrawl-ws
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Firecrawl 是什么?

AI-native web scraping, crawling, domain mapping, and structured extraction. Use for converting websites into LLM-ready Markdown, scraping pages with dynamic... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 46 次。

如何安装 Firecrawl?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install firecrawl-ws」即可一键安装,无需额外配置。

Firecrawl 是免费的吗?

是的,Firecrawl 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Firecrawl 支持哪些平台?

Firecrawl 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Firecrawl?

由 Simon-Pierrre Boucher(@simonpierreboucher02)开发并维护,当前版本 v1.0.0。

💬 留言讨论