Description

Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scrapin...

README (SKILL.md)

Crawl4AI Skill - Web Crawler & Scraper

Name: Crawl4ai Skill
Author: lancelin111

Web Crawling 网页爬虫 | Web Scraping 网页爬取 | LLM 优化输出

智能网页爬虫和爬取工具，支持搜索、全站爬取、动态页面抓取。Free web crawler and scraper with LLM-optimized Markdown output.

核心功能 | Core Features

🔍 Web Search 网页搜索 - DuckDuckGo search, 免 API key
🕷️ Web Crawling 网页爬虫 - Site crawler, spider, sitemap 识别
📝 Web Scraping 网页抓取 - Smart scraper, data extraction
📄 LLM-Optimized Output - Fit Markdown, 省 Token 80%
⚡ Dynamic Page Scraping - JavaScript 渲染页面爬取

快速开始 | Quick Start

安装 | Installation

pip install crawl4ai-skill

Web Search | 网页搜索

# Search the web with DuckDuckGo
crawl4ai-skill search "python web scraping"

Web Scraping | 单页爬取

# Scrape a single web page
crawl4ai-skill crawl https://example.com

Web Crawling | 全站爬虫

# Crawl entire website / spider
crawl4ai-skill crawl-site https://docs.python.org --max-pages 50

使用场景 | Use Cases

场景 1：Web Crawler for Documentation | 文档站爬虫

# Crawl documentation site with spider
crawl4ai-skill crawl-site https://docs.fastapi.com --max-pages 100

爬虫效果 | Crawler Output:

❌ 移除：导航栏、侧边栏、广告
✅ 保留：标题、正文、代码块
📊 Token：50,000 → 10,000（-80%）

场景 2：Search + Scrape | 搜索+爬取

# Search and scrape top results
crawl4ai-skill search-and-crawl "Vue 3 best practices" --crawl-top 3

场景 3：Dynamic Page Scraping | 动态页面抓取

JavaScript 渲染的页面爬取（雪球、知乎等）：

# Scrape JavaScript-heavy pages
crawl4ai-skill crawl https://xueqiu.com/S/BIDU --wait-until networkidle --delay 2

命令参考 | Commands

命令 Command	说明 Description
`search \x3Cquery>`	Web search 网页搜索
`crawl \x3Curl>`	Web scraping 单页爬取
`crawl-site \x3Curl>`	Web crawling 全站爬虫
`search-and-crawl \x3Cquery>`	Search + scrape 搜索并爬取

常用参数 | Common Options

# Web Search 搜索
--num-results 10          # Number of results

# Web Scraping 爬取
--format fit_markdown     # Output format
--output result.md        # Output file
--wait-until networkidle  # Wait strategy for dynamic pages
--delay 2                 # Additional wait time (seconds)
--wait-for ".selector"    # Wait for specific element

# Web Crawling 爬虫
--max-pages 100          # Max pages to crawl
--max-depth 3            # Max crawl depth

输出格式 | Output Formats

fit_markdown（推荐 Recommended）

智能提取，节省 80% Token。Smart extraction, save 80% tokens.

crawl4ai-skill crawl https://example.com --format fit_markdown

raw_markdown

保留完整结构。Preserve full structure.

crawl4ai-skill crawl https://example.com --format raw_markdown

为什么选择这个爬虫？| Why This Crawler?

✅ 免费爬虫 Free Crawler - 无需 API key，开箱即用
✅ 智能爬取 Smart Scraper - 自动去噪，提取核心内容
✅ 全站爬虫 Site Crawler - 支持 sitemap，递归爬取
✅ 动态爬取 Dynamic Scraping - JavaScript 渲染页面支持
✅ 搜索集成 Search Integration - DuckDuckGo 搜索内置

链接 | Links

📦 PyPI
💻 GitHub
🦞 ClawHub

Usage Guidance

This skill appears internally consistent for web crawling/scraping. Before installing: (1) review the crawl4ai-skill PyPI/GitHub source or the package wheel to confirm what code will run, (2) prefer installing in a virtualenv or sandbox, (3) be mindful of legal/ethical constraints and robots.txt for target sites, and (4) limit use against any sensitive endpoints or credentials. If you need higher assurance, inspect the package's GitHub repo and the code for any unexpected network calls or telemetry.

Capability Analysis

Type: OpenClaw Skill Name: crawl4ai-skill Version: 1.0.10 The skill bundle defines a wrapper for the 'crawl4ai-skill' CLI tool, providing capabilities for web searching via DuckDuckGo and web scraping/crawling. The documentation in SKILL.md is well-structured, providing clear instructions for the AI agent to perform legitimate tasks like site crawling and dynamic page scraping without any evidence of prompt injection, data exfiltration, or malicious intent.

Capability Assessment

✓ Purpose & Capability

Name/description describe a web crawler/scraper and the SKILL.md only asks for a crawl4ai-skill binary / pip package and shows crawling/scraping commands — these are coherent with the stated purpose.

✓ Instruction Scope

Runtime instructions and examples are limited to searching, crawling, and scraping behavior (including dynamic JS pages). They do not ask the agent to read unrelated files, environment variables, or to exfiltrate data to third-party endpoints.

ℹ Install Mechanism

This is an instruction-only skill; SKILL.md recommends 'pip install crawl4ai-skill' (PyPI). That is an expected install path for a Python CLI, but installing from PyPI executes third-party code — review the package source before installing or run in an isolated environment.

✓ Credentials

No environment variables, credentials, or config paths are requested. The lack of extra secrets is proportionate to a web crawler skill.

✓ Persistence & Privilege

Skill is not marked always:true and does not request any persistent system-wide privileges or modify other skills. Autonomous invocation is allowed (platform default) and not combined with other concerning requests.

Version History

v1.0.10

**Crawl4AI Skill 1.1.0 — Bilingual, Expanded Doc, and Dynamic Scraping** - Major rewrite of documentation for improved clarity; now fully bilingual (English & Chinese). - Expanded feature list to highlight DuckDuckGo search, full-site crawling, dynamic page (JavaScript) scraping, and LLM-optimized outputs. - Added command examples and use cases for both static and dynamic web pages. - More tags introduced for better discoverability (e.g., spider, crawler, 爬虫). - Added comparison and value proposition section: free, no API key, smart extraction, dynamic content support.

v1.0.9

- Update version to 0.3.0 - Remove references to login features and credentials from documentation - Simplify requirements (remove playwright and Twitter cookies mentions) - Drop advanced features: authenticated crawling, Twitter/X and 小红书 login, and session management sections - Streamline usage examples, focusing on core search and crawl capabilities only

v1.0.8

- SKILL.md content streamlined and significantly shortened for clarity and ease of use. - License field changed from MIT to MIT-0. - Dependency and tags lists simplified; advanced security and input details removed. - Usage instructions, command examples, and credential requirements are now brief and direct. - Security and privacy explanations condensed but remain clear about local data storage and encryption. - Redundant or verbose sections removed; focus shifted to quick start, core features, and key commands.

v1.0.7

Version 1.0.7 - Major update to documentation (SKILL.md) with a new concise structure emphasizing use-cases and practical scenarios. - Enhanced quickstart, usage, and feature examples for clarity and accessibility. - Expanded explanation of output formats and advanced parameters. - Streamlined safety and login instructions for Twitter/X and Xiaohongshu. - No functional changes to code; documentation only.

v1.0.6

Version 1.0.6 - Updated SKILL.md for enhanced clarity and completeness. - Added more metadata fields including license, homepage, and package verification details. - Improved security documentation: detailed encryption, storage, and automated security scanning methods. - Clarified and annotated installation and credential handling recommendations. - No changes to core functionality; documentation only.

v1.0.5

**Session cookie storage is now AES-128 encrypted and install method updated.** - Session cookies are now AES-128 encrypted and stored as <platform>_session.enc, with a key derived from the machine's identifier for added security. - Changed install method to use PyPI by default; now recommended: pip install crawl4ai-skill. - Updated documentation to describe encryption, file locations, and new install instructions. - Minor improvements and clarifications to security and usage documentation.

v1.0.4

- Enhanced security documentation: added detailed guidelines for credential management, recommended environment variable usage, data privacy statements, and explicit permissions. - Updated installation instructions to emphasize code review and security scanning (e.g., with bandit). - Added responsibility disclaimer and explicit warnings about security risks. - Provided best practices for safe cookie handling and recommended not using public computers. - Deprecated inline cookie passing in favor of safer methods (environment variables, protected files, and interactive input).

v1.0.3

Major update: Skill redesigned with expanded features, login support, and new usage. - Rebranded as crawl4ai-skill, with new author and repository. - Now supports search, full-site crawl, and output optimized for LLM Markdown, reducing token usage. - Added built-in login and session management for Twitter/X and Xiaohongshu, allowing extraction from login-protected content. - Detailed command-line usage, install instructions, and security practices now documented. - Dropped old JavaScript script and Node.js dependency in favor of a Python + Playwright implementation.

v1.0.2

- Added explicit security section detailing session and browser data storage locations and privacy assurances. - Updated installation instructions to recommend code inspection by cloning the repository before installation. - Improved installation steps: clarified pip and playwright install commands. - Declared the playwright binary under required bins for better dependency guidance.

v1.0.1

No visible changes were made in this release. - Version bumped from 0.1.0 to 1.0.1, but no content or file changes detected.

v1.0.0

Initial release of crawl4ai-skill. - Provides an intelligent tool for searching, crawling, and saving tokens with support for popular login-required sites. - Includes DuckDuckGo search, site-wide crawling, and LLM-optimized Markdown output to reduce token usage. - Built-in support for login crawling on Twitter/X and Xiaohongshu. - CLI commands cover search, crawl (single/multi-page), login, session management, and Twitter tweet extraction. - Simple installation via shell script or pip.i

Metadata

Slug crawl4ai-skill

Version 1.0.10

License MIT-0

All-time Installs 12

Active Installs 12

Total Versions 11

Frequently Asked Questions

What is Crawl4ai Skill?

Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scrapin... It is an AI Agent Skill for Claude Code / OpenClaw, with 2666 downloads so far.

How do I install Crawl4ai Skill?

Run "/install crawl4ai-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Crawl4ai Skill free?

Yes, Crawl4ai Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Crawl4ai Skill support?

Crawl4ai Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Crawl4ai Skill?

It is built and maintained by lance (@lancelin111); the current version is v1.0.10.

More Skills

Crawl4ai Skill