← 返回 Skills 市场

Crawl4AI Web Scraper

Name: Crawl4AI Web Scraper
Author: angusthefuzz

作者 angusthefuzz · GitHub ↗ · v1.0.1

cross-platform ✓ 安全检测通过

2867

总下载

当前安装

版本数

在 OpenClaw 中安装

/install crawl-for-ai

功能描述

Full web page scraping with JavaScript rendering via local Crawl4AI instance, delivering clean markdown or detailed JSON including links and media.

使用说明 (SKILL.md)

Crawl4AI Web Scraper

Local Crawl4AI instance for full web page extraction with JavaScript rendering.

Endpoints

Proxy (port 11234) — Clean output, OpenWebUI-compatible

Returns: [{page_content, metadata}]
Use for: Simple content extraction

Direct (port 11235) — Full output with all data

Returns: {results: [{markdown, html, links, media, ...}]}
Use for: When you need links, media, or other metadata

Usage

# Via script
node {baseDir}/scripts/crawl4ai.js "url"
node {baseDir}/scripts/crawl4ai.js "url" --json

Script options:

--json — Full JSON response

Output: Clean markdown from the page.

Configuration

Required environment variable:

CRAWL4AI_URL — Your Crawl4AI instance URL (e.g., http://localhost:11235)

Optional:

CRAWL4AI_KEY — API key if your instance requires authentication

Features

JavaScript rendering — Handles dynamic content
Unlimited usage — Local instance, no API limits
Full content — HTML, markdown, links, media, tables
Better than Tavily for complex pages with JS

API

Uses your local Crawl4AI instance REST API. Auth header only sent if CRAWL4AI_KEY is set.

安全使用建议

This skill appears to do what it claims, but review these points before installing: - You must set CRAWL4AI_URL to a trusted Crawl4AI endpoint (for local usage set it to http://localhost:11235). The skill will send the URL(s) you want scraped to that endpoint, so pointing it at an untrusted remote service could leak the pages you request and any scraped content. - The script only supports plain HTTP (uses Node's http module). If you provide an https:// URL the script may fail; treat that as an implementation bug, not a security feature. - The only environment variables used are CRAWL4AI_URL and optional CRAWL4AI_KEY. If your Crawl4AI instance requires a key, use a secret you trust. There are no other hidden env reads or file accesses. - Metadata mismatch: the registry metadata omitted the required env var that the SKILL.md and script require. This is likely a packaging oversight—confirm CRAWL4AI_URL will be supplied before use. - If you plan to let the agent invoke skills autonomously, remember the agent could call this skill and thereby send target URLs to the configured Crawl4AI endpoint; ensure that behavior is acceptable in your environment. Overall: coherent and low-risk provided you point it to a trusted Crawl4AI instance and are aware of the HTTP/HTTPS limitation.

功能分析

Type: OpenClaw Skill Name: crawl-for-ai Version: 1.0.1 This skill bundle is designed to interact with a user-configured local Crawl4AI instance for web scraping. The `scripts/crawl4ai.js` file makes HTTP POST requests to the `CRAWL4AI_URL` (an environment variable) and includes an optional `CRAWL4AI_KEY` for authentication. All network activity is directed to the explicitly configured endpoint, which is essential for its stated purpose. There is no evidence of data exfiltration to unauthorized destinations, malicious execution, persistence mechanisms, or prompt injection attempts in `SKILL.md` designed to manipulate the AI agent into harmful actions. The code is transparent and aligns with the description of a tool for a self-hosted service.

能力评估

✓ Purpose & Capability

Name/description match what the files do: the SKILL.md and included Node script send URLs to a Crawl4AI instance and return markdown/JSON. Declared required binary (node) is appropriate. One minor inconsistency: registry metadata earlier listed no required env vars, but SKILL.md (and the script) require CRAWL4AI_URL (and optionally CRAWL4AI_KEY). This appears to be a metadata omission rather than malicious.

ℹ Instruction Scope

Instructions and the script remain within scope: they POST {urls: [...] } to the configured Crawl4AI URL and print returned markdown/JSON. The script only reads CRAWL4AI_URL and optional CRAWL4AI_KEY. Notes: the script uses Node's http module (not https) which means HTTPS endpoints will not be handled correctly; the help text mentions a default API key of '1234' (harmless but confusing). There are no instructions to read unrelated files or environment variables.

✓ Install Mechanism

No install spec or external downloads; the skill is instruction+embedded script only. No third-party packages are fetched at install time, so install risk is low.

✓ Credentials

Only CRAWL4AI_URL (required) and CRAWL4AI_KEY (optional) are used. These directly map to the stated purpose (target instance URL and optional auth). No unrelated secrets or config paths are requested.

✓ Persistence & Privilege

Skill does not request always:true and does not attempt to modify other skills or system-wide settings. Agent-autonomous invocation is allowed by default (normal for skills) but not specially privileged here.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install crawl-for-ai
安装完成后，直接呼叫该 Skill 的名称或使用 /crawl-for-ai 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

Security: removed default API key values, added explicit env validation

v1.0.0

Initial release - local Crawl4AI instance client for JS-rendered web scraping

元数据

Slug crawl-for-ai

版本 1.0.1

许可证 —

累计安装 15

当前安装数 15

历史版本数 2

常见问题