← 返回 Skills 市场
angusthefuzz

Crawl4AI Web Scraper

作者 angusthefuzz · GitHub ↗ · v1.0.1
cross-platform ✓ 安全检测通过
2867
总下载
5
收藏
15
当前安装
2
版本数
在 OpenClaw 中安装
/install crawl-for-ai
功能描述
Full web page scraping with JavaScript rendering via local Crawl4AI instance, delivering clean markdown or detailed JSON including links and media.
使用说明 (SKILL.md)

Crawl4AI Web Scraper

Local Crawl4AI instance for full web page extraction with JavaScript rendering.

Endpoints

Proxy (port 11234) — Clean output, OpenWebUI-compatible

  • Returns: [{page_content, metadata}]
  • Use for: Simple content extraction

Direct (port 11235) — Full output with all data

  • Returns: {results: [{markdown, html, links, media, ...}]}
  • Use for: When you need links, media, or other metadata

Usage

# Via script
node {baseDir}/scripts/crawl4ai.js "url"
node {baseDir}/scripts/crawl4ai.js "url" --json

Script options:

  • --json — Full JSON response

Output: Clean markdown from the page.

Configuration

Required environment variable:

  • CRAWL4AI_URL — Your Crawl4AI instance URL (e.g., http://localhost:11235)

Optional:

  • CRAWL4AI_KEY — API key if your instance requires authentication

Features

  • JavaScript rendering — Handles dynamic content
  • Unlimited usage — Local instance, no API limits
  • Full content — HTML, markdown, links, media, tables
  • Better than Tavily for complex pages with JS

API

Uses your local Crawl4AI instance REST API. Auth header only sent if CRAWL4AI_KEY is set.

安全使用建议
This skill appears to do what it claims, but review these points before installing: - You must set CRAWL4AI_URL to a trusted Crawl4AI endpoint (for local usage set it to http://localhost:11235). The skill will send the URL(s) you want scraped to that endpoint, so pointing it at an untrusted remote service could leak the pages you request and any scraped content. - The script only supports plain HTTP (uses Node's http module). If you provide an https:// URL the script may fail; treat that as an implementation bug, not a security feature. - The only environment variables used are CRAWL4AI_URL and optional CRAWL4AI_KEY. If your Crawl4AI instance requires a key, use a secret you trust. There are no other hidden env reads or file accesses. - Metadata mismatch: the registry metadata omitted the required env var that the SKILL.md and script require. This is likely a packaging oversight—confirm CRAWL4AI_URL will be supplied before use. - If you plan to let the agent invoke skills autonomously, remember the agent could call this skill and thereby send target URLs to the configured Crawl4AI endpoint; ensure that behavior is acceptable in your environment. Overall: coherent and low-risk provided you point it to a trusted Crawl4AI instance and are aware of the HTTP/HTTPS limitation.
功能分析
Type: OpenClaw Skill Name: crawl-for-ai Version: 1.0.1 This skill bundle is designed to interact with a user-configured local Crawl4AI instance for web scraping. The `scripts/crawl4ai.js` file makes HTTP POST requests to the `CRAWL4AI_URL` (an environment variable) and includes an optional `CRAWL4AI_KEY` for authentication. All network activity is directed to the explicitly configured endpoint, which is essential for its stated purpose. There is no evidence of data exfiltration to unauthorized destinations, malicious execution, persistence mechanisms, or prompt injection attempts in `SKILL.md` designed to manipulate the AI agent into harmful actions. The code is transparent and aligns with the description of a tool for a self-hosted service.
能力评估
Purpose & Capability
Name/description match what the files do: the SKILL.md and included Node script send URLs to a Crawl4AI instance and return markdown/JSON. Declared required binary (node) is appropriate. One minor inconsistency: registry metadata earlier listed no required env vars, but SKILL.md (and the script) require CRAWL4AI_URL (and optionally CRAWL4AI_KEY). This appears to be a metadata omission rather than malicious.
Instruction Scope
Instructions and the script remain within scope: they POST {urls: [...] } to the configured Crawl4AI URL and print returned markdown/JSON. The script only reads CRAWL4AI_URL and optional CRAWL4AI_KEY. Notes: the script uses Node's http module (not https) which means HTTPS endpoints will not be handled correctly; the help text mentions a default API key of '1234' (harmless but confusing). There are no instructions to read unrelated files or environment variables.
Install Mechanism
No install spec or external downloads; the skill is instruction+embedded script only. No third-party packages are fetched at install time, so install risk is low.
Credentials
Only CRAWL4AI_URL (required) and CRAWL4AI_KEY (optional) are used. These directly map to the stated purpose (target instance URL and optional auth). No unrelated secrets or config paths are requested.
Persistence & Privilege
Skill does not request always:true and does not attempt to modify other skills or system-wide settings. Agent-autonomous invocation is allowed by default (normal for skills) but not specially privileged here.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install crawl-for-ai
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /crawl-for-ai 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
Security: removed default API key values, added explicit env validation
v1.0.0
Initial release - local Crawl4AI instance client for JS-rendered web scraping
元数据
Slug crawl-for-ai
版本 1.0.1
许可证
累计安装 15
当前安装数 15
历史版本数 2
常见问题

Crawl4AI Web Scraper 是什么?

Full web page scraping with JavaScript rendering via local Crawl4AI instance, delivering clean markdown or detailed JSON including links and media. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 2867 次。

如何安装 Crawl4AI Web Scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install crawl-for-ai」即可一键安装,无需额外配置。

Crawl4AI Web Scraper 是免费的吗?

是的,Crawl4AI Web Scraper 完全免费(开源免费),可自由下载、安装和使用。

Crawl4AI Web Scraper 支持哪些平台?

Crawl4AI Web Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Crawl4AI Web Scraper?

由 angusthefuzz(@angusthefuzz)开发并维护,当前版本 v1.0.1。

💬 留言讨论