← 返回 Skills 市场

Firecrawl Scrape Cn

Name: Firecrawl Scrape Cn
Author: yang1002378395-cmyk

作者 yang1002378395-cmyk · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

827

总下载

当前安装

版本数

在 OpenClaw 中安装

/install firecrawl-scrape-cn

功能描述

从任意 URL 提取干净的 Markdown 内容，包括 JS 渲染的 SPA。当用户提供 URL 并想要其内容、说"抓取"、"抓网页"、"获取页面"、"从 URL 提取"或"读取网页"时使用此 Skill。支持 JS 渲染页面、多个并发 URL，返回 LLM 优化的 Markdown。

使用说明 (SKILL.md)

Firecrawl 网页抓取

抓取一个或多个 URL，返回干净的、LLM 优化的 Markdown。多个 URL 并发抓取。

使用场景

你有特定的 URL 并想要其内容
页面是静态或 JS 渲染的（SPA）
工作流升级模式的第 2 步：搜索 → 抓取 → 映射 → 爬取 → 交互

快速开始

# 基础 Markdown 提取
firecrawl scrape "\x3Curl>" -o .firecrawl/page.md

# 仅主内容，无导航/页脚
firecrawl scrape "\x3Curl>" --only-main-content -o .firecrawl/page.md

# 等待 JS 渲染后抓取
firecrawl scrape "\x3Curl>" --wait-for 3000 -o .firecrawl/page.md

# 多个 URL（每个保存到 .firecrawl/）
firecrawl scrape https://example.com https://example.com/blog https://example.com/docs

# 获取 Markdown 和链接
firecrawl scrape "\x3Curl>" --format markdown,links -o .firecrawl/page.json

# 询问页面内容的问题
firecrawl scrape "https://example.com/pricing" --query "企业版价格是多少？"

选项

选项	描述
`-f, --format \x3Cformats>`	输出格式：markdown, html, rawHtml, links, screenshot, json
`-Q, --query \x3Cprompt>`	询问页面内容的问题（5 积分）
`-H`	在输出中包含 HTTP 头
`--only-main-content`	去除导航、页脚、侧边栏 — 仅主内容
`--wait-for \x3Cms>`	抓取前等待 JS 渲染
`--include-tags \x3Ctags>`	仅包含这些 HTML 标签
`--exclude-tags \x3Ctags>`	排除这些 HTML 标签
`-o, --output \x3Cpath>`	输出文件路径

提示

优先使用普通抓取而非 --query。 抓取到文件，然后用 grep、head 或直接读取 Markdown — 你可以自己搜索和推理完整内容。仅当你想要单个目标答案而不保存页面时使用 --query（额外消耗 5 积分）。
先尝试抓取再交互。 抓取可以处理静态页面和 JS 渲染的 SPA。仅在需要交互（点击、表单填充、分页）时升级到 interact。
多个 URL 并发抓取 — 用 firecrawl --status 查看你的并发限制。
单格式输出原始内容。多格式（如 --format markdown,links）输出 JSON。
始终引用 URL — shell 会将 ? 和 & 解释为特殊字符。
命名约定：.firecrawl/{site}-{path}.md

另见

firecrawl-search — 当你没有 URL 时查找页面
firecrawl-browser — 当抓取无法获取内容时，用 interact 点击、填充表单等
firecrawl-download — 批量下载整个站点到本地文件

中文翻译版 | 原版：firecrawl-scrape version: "1.0.0"

安全使用建议

This skill is coherent for scraping web pages into Markdown, but before installing or invoking it: (1) Prefer a vetted binary — if you must use 'npx firecrawl', understand npx will download and run code from the npm registry at runtime; only run it if you trust the package and maintainer. (2) Consider pre-installing the firecrawl CLI from a trusted release (and verify signatures/checksums) instead of using npx. (3) Scraped content will be written to .firecrawl/ in the agent environment — review filesystem permissions and sensitive data that might be captured. (4) Be mindful of legal and robots.txt/crawling rules for target sites and avoid providing credentials to the scraper unless necessary. (5) If you need higher assurance, request the skill source or an install spec pointing to an official release host (GitHub release or package registry info) before enabling.

能力评估

✓ Purpose & Capability

Name/description describe scraping arbitrary URLs (including JS-rendered SPA) into Markdown. The SKILL.md only asks the agent to invoke a 'firecrawl' CLI (or 'npx firecrawl'), save outputs to .firecrawl/, and optionally query results — these are appropriate for the stated purpose.

✓ Instruction Scope

Runtime instructions are limited to running the firecrawl CLI with flags, writing output files (e.g., .firecrawl/page.md), and optionally asking a query. The docs do not instruct the agent to read unrelated system files, environment variables, or exfiltrate agent state. They advise quoting URLs and using grep/head, which is expected.

ℹ Install Mechanism

There is no install spec (instruction-only), which is low-risk, but the SKILL.md and allowed-tools explicitly reference 'npx firecrawl'. Using npx will fetch and execute a package from the npm registry at runtime — a legitimate delivery method for CLI tools but a moderate risk if the package/source is not verified. The skill does not declare an authoritative package source or checksum.

✓ Credentials

The skill requests no environment variables, no credentials, and no config paths. This is proportionate for a web-scraping CLI that operates against arbitrary URLs provided by the user.

✓ Persistence & Privilege

always:false and normal agent invocation; the skill writes its own output files under .firecrawl/ which is reasonable. It does not request permanent system-wide presence or modify other skills' configs.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install firecrawl-scrape-cn
安装完成后，直接呼叫该 Skill 的名称或使用 /firecrawl-scrape-cn 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

firecrawl-scrape-cn v1.0.0 - 首次发布，提供 firecrawl 抓取工具的中文文档与用法说明。 - 支持从任意 URL 提取纯净 Markdown 内容，包括 JS 渲染页面和并发多 URL 抓取。 - 列举主要用法、选项参数及常见使用场景，帮助中文用户快速上手。 - 提供与 firecrawl 相关技能的参考链接及使用建议。

元数据

Slug firecrawl-scrape-cn

版本 1.0.0

许可证 MIT-0

累计安装 3

当前安装数 1

历史版本数 1

常见问题