← 返回 Skills 市场
orange-afk

Crawl4ai Docker Skill

作者 Orange · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
131
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install crawl4ai-docker-skill
功能描述
Dockerized web crawling and scraping service with REST API. Docker化网页爬虫服务 | Web crawler, web scraper, REST API service. Intelligent content extraction with L...
使用说明 (SKILL.md)

Crawl4AI Docker Skill - Web Crawler & Scraper Service

Dockerized Web Crawling 网页爬虫服务 | REST API 网页爬取 | LLM 智能提取

基于 Docker 部署的 Crawl4AI 网页爬虫服务,提供完整的 REST API 接口,支持智能内容提取和 LLM 优化输出。

🚀 核心功能 | Core Features

  • 🐳 Docker 部署 - 容器化服务,端口 11235
  • 🔌 REST API - 完整的 HTTP 接口
  • 🤖 LLM 智能提取 - 支持多种 LLM 提供商
  • 📊 实时监控 - 内置监控面板和 API
  • 高性能 - 异步处理,支持并发请求

📋 快速开始 | Quick Start

前提条件 | Prerequisites

确保 Docker Compose 服务正在运行:

# 检查服务状态
docker compose ps

# 健康检查
curl http://localhost:11235/health

# 访问监控面板
open http://localhost:11235/dashboard

🔌 REST API 使用指南

基础网页抓取 | Basic Web Crawling

简单 Markdown 提取

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "extraction_strategy": "markdown"
  }'

带浏览器配置

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "extraction_strategy": "markdown",
    "browser_config": {
      "headless": true,
      "viewport_width": 1280,
      "viewport_height": 720
    }
  }'

LLM 智能提取 | LLM Smart Extraction

内容总结

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "总结网页的主要内容",
      "max_tokens": 1000
    }
  }'

结构化数据提取

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/products"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "提取所有产品名称、价格和描述,返回 JSON 格式",
      "max_tokens": 1500,
      "temperature": 0.1
    }
  }'

高级功能 | Advanced Features

网页截图

curl -X POST "http://localhost:11235/screenshot" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "options": {
      "full_page": true,
      "quality": 80
    }
  }'

PDF 生成

curl -X POST "http://localhost:11235/pdf" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'

📊 API 端点参考 | API Endpoints Reference

核心端点 | Core Endpoints

端点 方法 用途
POST /crawl POST 网页抓取和内容提取
GET /health GET 服务健康检查
GET /dashboard GET 监控面板

监控端点 | Monitoring Endpoints

端点 方法 用途
GET /monitor/health GET 系统健康状态
GET /monitor/browsers GET 浏览器池状态
GET /monitor/requests GET 请求统计

工具端点 | Utility Endpoints

端点 方法 用途
POST /screenshot POST 网页截图
POST /pdf POST PDF 生成
POST /execute_js POST JavaScript 执行

🎯 使用场景 | Use Cases

场景 1:文档网站爬取 | Documentation Site Crawling

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://docs.openclaw.ai/zh-CN"],
    "extraction_strategy": "markdown"
  }'

场景 2:新闻文章提取 | News Article Extraction

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://news-site.com/article"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "提取文章标题、作者、发布时间和主要内容",
      "max_tokens": 1500
    }
  }'

场景 3:产品信息抓取 | Product Information Scraping

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://ecommerce-site.com/products"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "提取所有产品的名称、价格、描述和图片链接",
      "max_tokens": 2000
    }
  }'

⚙️ 配置说明 | Configuration

LLM 提供商配置 | LLM Provider Configuration

创建 .llm.env 文件:

# OpenRouter 配置
OPENROUTER_API_KEY=your-api-key
LLM_PROVIDER=openrouter/free
LLM_MAX_TOKENS=2000
LLM_TEMPERATURE=0.7

# 或使用其他提供商
# OPENAI_API_KEY=sk-your-key
# OPENAI_BASE_URL=https://your-custom-api.com/v1
# LLM_PROVIDER=openai/gpt-4o-mini

浏览器配置 | Browser Configuration

{
  "browser_config": {
    "headless": true,
    "viewport_width": 1280,
    "viewport_height": 720,
    "user_agent": "Mozilla/5.0..."
  }
}

📈 响应格式 | Response Format

成功响应 | Success Response

{
  "success": true,
  "results": [
    {
      "url": "https://example.com",
      "markdown": "# 提取的 Markdown 内容...",
      "metadata": {
        "title": "网页标题",
        "description": "网页描述",
        "url": "https://example.com"
      },
      "extracted_content": {
        "summary": "LLM 提取的内容..."
      }
    }
  ]
}

错误响应 | Error Response

{
  "success": false,
  "error": "错误信息",
  "code": "ERROR_CODE"
}

🔧 故障排除 | Troubleshooting

常见问题 | Common Issues

1. 服务未启动

# 检查容器状态
docker compose ps

# 查看日志
docker compose logs crawl4ai

# 重启服务
docker compose restart crawl4ai

2. LLM 提取失败

  • 检查 .llm.env 配置
  • 验证 API 密钥
  • 测试不同的 LLM 提供商

3. 网络连接问题

# 测试网络连接
curl -I https://example.com

# 检查代理配置
env | grep -i proxy

监控和调试 | Monitoring & Debugging

# 访问监控面板
open http://localhost:11235/dashboard

# 查看系统健康
curl http://localhost:11235/monitor/health

# 查看浏览器池状态
curl http://localhost:11235/monitor/browsers

🔗 相关链接 | Links


🎉 为什么选择 Docker 版本?

容器化部署 - 一键启动,环境隔离
REST API - 标准 HTTP 接口,易于集成
实时监控 - 内置监控面板和 API
资源管理 - 自动浏览器池管理
生产就绪 - 企业级稳定性和性能

立即开始使用 Docker 化的 Crawl4AI 服务! 🚀

安全使用建议
This package appears to be legitimate documentation and helper scripts for running a Dockerized Crawl4AI service, but there are a few things to check before installing: - Metadata mismatch: the skill's SKILL.md tells you to create a .llm.env containing LLM API keys (OPENROUTER_API_KEY, OPENAI_API_KEY, etc.), but the registry metadata does not declare any required environment variables. Treat LLM API keys as sensitive — only provide them if you trust the Crawl4AI image and its source. - Missing binary declarations: the scripts call curl and jq (and examples use open). Make sure those binaries are present and trustworthy in your environment. - Network exposure / SSRF risk: running a crawler that fetches arbitrary URLs can probe internal network services from the host/container. Run it in a network-restricted environment (isolated Docker network, no privileged host networking) if you don't want it to access internal resources. - Container image provenance: example-config.json references image unclecode/crawl4ai:latest. Verify the Docker image source (official repo, signed image, or audit the image) before pulling/running it, especially if providing API keys. - /execute_js and LLM providers: endpoints that execute JavaScript and submit scraped content to external LLM providers can leak sensitive data. Review what you send to external LLMs and ensure you redact secrets. If you want to proceed, confirm the Docker image origin, run the service in an isolated environment, only supply LLM keys you control, and ensure curl/jq are installed from trusted sources. If you need, provide the docker-compose file or the actual image manifest and I can help inspect it further.
功能分析
Type: OpenClaw Skill Name: crawl4ai-docker-skill Version: 1.0.0 The skill provides a legitimate set of tools and instructions for interacting with a Dockerized Crawl4AI web scraping service. It includes helper scripts (scripts/crawl4ai-docker.sh and scripts/test-crawl4ai.sh) that facilitate standard API calls for web crawling, LLM-based content extraction, and service monitoring via a local endpoint (http://localhost:11235). No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the code and instructions are well-aligned with the stated purpose of managing a containerized scraper.
能力评估
Purpose & Capability
Name/description match the files: this is a Dockerized Crawl4AI crawler that exposes a REST API and supports LLM-based extraction. That purpose justifies the examples, endpoints, and LLM provider configuration shown in SKILL.md and example-config.json.
Instruction Scope
SKILL.md and scripts instruct the agent/user to run curl requests against a local service, create a .llm.env with LLM API keys, and call endpoints that can execute JS (/execute_js). The runtime instructions will cause the service to fetch arbitrary URLs (user-supplied), which can reach internal network hosts if the service runs in a permissive network environment. The docs reference reading .llm.env values not declared in the registry metadata (see environment_proportionality).
Install Mechanism
There is no install spec (instruction-only plus utility scripts). Nothing is downloaded or extracted by the skill bundle itself, so no high-risk install actions are present in the package.
Credentials
SKILL.md documents LLM API keys (OPENROUTER_API_KEY, OPENAI_API_KEY, etc.) and instructs creating .llm.env, but the registry metadata lists no required environment variables or primary credential. The scripts also assume binaries (curl, jq, maybe open) but the skill metadata doesn't declare required binaries. This mismatch between documentation and declared requirements is a coherence issue and a potential operational risk.
Persistence & Privilege
always is false and the skill does not request elevated platform privileges. It does not modify other skills or global configuration in the bundle. Normal autonomous invocation is allowed (default) but not combined with other high-risk flags.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install crawl4ai-docker-skill
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /crawl4ai-docker-skill 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of crawl4ai-docker-skill. - Provides a Dockerized web crawling and scraping service with a full REST API. - Includes intelligent content extraction using LLM optimization. - Features real-time monitoring with built-in dashboard and API endpoints. - Supports advanced operations like webpage screenshots and PDF generation. - Flexible browser and LLM provider configuration. - Comprehensive usage examples and troubleshooting included in documentation.
元数据
Slug crawl4ai-docker-skill
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Crawl4ai Docker Skill 是什么?

Dockerized web crawling and scraping service with REST API. Docker化网页爬虫服务 | Web crawler, web scraper, REST API service. Intelligent content extraction with L... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 131 次。

如何安装 Crawl4ai Docker Skill?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install crawl4ai-docker-skill」即可一键安装,无需额外配置。

Crawl4ai Docker Skill 是免费的吗?

是的,Crawl4ai Docker Skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Crawl4ai Docker Skill 支持哪些平台?

Crawl4ai Docker Skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Crawl4ai Docker Skill?

由 Orange(@orange-afk)开发并维护,当前版本 v1.0.0。

💬 留言讨论