← 返回 Skills 市场
techlaai

AnyCrawl-API

作者 techlaai · GitHub ↗ · v1.0.1
cross-platform ⚠ suspicious
2945
总下载
1
收藏
10
当前安装
2
版本数
在 OpenClaw 中安装
/install anycrawl
功能描述
Perform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API.
使用说明 (SKILL.md)

AnyCrawl Skill

AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.

Setup

Method 1: Environment variable (Recommended)

export ANYCRAWL_API_KEY="your-api-key"

Make it permanent by adding to ~/.bashrc or ~/.zshrc:

echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc

Get your API key at: https://anycrawl.dev

Method 2: OpenClaw gateway config

openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"

Functions

1. anycrawl_scrape

Scrape a single URL and convert to LLM-ready structured data.

Parameters:

  • url (string, required): URL to scrape
  • engine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer"
  • formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"]
  • timeout (number, optional): Timeout in milliseconds (default: 30000)
  • wait_for (number, optional): Delay before extraction in ms (browser engines only)
  • wait_for_selector (string/object/array, optional): Wait for CSS selectors
  • include_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"])
  • exclude_tags (array, optional): Exclude these HTML tags
  • proxy (string, optional): Proxy URL (e.g., "http://proxy:port")
  • json_options (object, optional): JSON extraction with schema/prompt
  • extract_source (string, optional): "markdown" (default) or "html"

Examples:

// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })

// Scrape SPA with Playwright
anycrawl_scrape({ 
  url: "https://spa-example.com",
  engine: "playwright",
  formats: ["markdown", "screenshot"]
})

// Extract structured JSON
anycrawl_scrape({
  url: "https://product-page.com",
  engine: "cheerio",
  json_options: {
    schema: {
      type: "object",
      properties: {
        product_name: { type: "string" },
        price: { type: "number" },
        description: { type: "string" }
      },
      required: ["product_name", "price"]
    },
    user_prompt: "Extract product details from this page"
  }
})

2. anycrawl_search

Search Google and return structured results.

Parameters:

  • query (string, required): Search query
  • engine (string, optional): Search engine - "google" (default)
  • limit (number, optional): Max results per page (default: 10)
  • offset (number, optional): Number of results to skip (default: 0)
  • pages (number, optional): Number of pages to retrieve (default: 1, max: 20)
  • lang (string, optional): Language locale (e.g., "en", "zh", "vi")
  • safe_search (number, optional): 0 (off), 1 (medium), 2 (high)
  • scrape_options (object, optional): Scrape each result URL with these options

Examples:

// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })

// Multi-page search in Vietnamese
anycrawl_search({ 
  query: "hướng dẫn Node.js",
  pages: 3,
  lang: "vi"
})

// Search and auto-scrape results
anycrawl_search({
  query: "best AI tools 2026",
  limit: 5,
  scrape_options: {
    engine: "cheerio",
    formats: ["markdown"]
  }
})

3. anycrawl_crawl_start

Start crawling an entire website (async job).

Parameters:

  • url (string, required): Seed URL to start crawling
  • engine (string, optional): "cheerio" (default), "playwright", "puppeteer"
  • strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin"
  • max_depth (number, optional): Max depth from seed URL (default: 10)
  • limit (number, optional): Max pages to crawl (default: 100)
  • include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"])
  • exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"])
  • scrape_paths (array, optional): Only scrape URLs matching these patterns
  • scrape_options (object, optional): Per-page scrape options

Examples:

// Crawl entire website
anycrawl_crawl_start({ 
  url: "https://docs.example.com",
  engine: "cheerio",
  max_depth: 5,
  limit: 50
})

// Crawl only blog posts
anycrawl_crawl_start({
  url: "https://example.com",
  strategy: "same-domain",
  include_paths: ["/blog/*"],
  exclude_paths: ["/blog/tags/*"],
  scrape_options: {
    formats: ["markdown"]
  }
})

// Crawl product pages only
anycrawl_crawl_start({
  url: "https://shop.example.com",
  strategy: "same-domain",
  scrape_paths: ["/products/*"],
  limit: 200
})

4. anycrawl_crawl_status

Check crawl job status.

Parameters:

  • job_id (string, required): Crawl job ID

Example:

anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })

5. anycrawl_crawl_results

Get crawl results (paginated).

Parameters:

  • job_id (string, required): Crawl job ID
  • skip (number, optional): Number of results to skip (default: 0)

Example:

// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })

// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })

6. anycrawl_crawl_cancel

Cancel a running crawl job.

Parameters:

  • job_id (string, required): Crawl job ID

7. anycrawl_search_and_scrape

Quick helper: Search Google then scrape top results.

Parameters:

  • query (string, required): Search query
  • max_results (number, optional): Max results to scrape (default: 3)
  • scrape_engine (string, optional): Engine for scraping (default: "cheerio")
  • formats (array, optional): Output formats (default: ["markdown"])
  • lang (string, optional): Search language

Example:

anycrawl_search_and_scrape({
  query: "latest AI news",
  max_results: 5,
  formats: ["markdown"]
})

Engine Selection Guide

Engine Best For Speed JS Rendering
cheerio Static HTML, news, blogs ⚡ Fastest ❌ No
playwright SPAs, complex web apps 🐢 Slower ✅ Yes
puppeteer Chrome-specific, metrics 🐢 Slower ✅ Yes

Response Format

All responses follow this structure:

{
  "success": true,
  "data": { ... },
  "message": "Optional message"
}

Error response:

{
  "success": false,
  "error": "Error type",
  "message": "Human-readable message"
}

Common Error Codes

  • 400 - Bad Request (validation errors)
  • 401 - Unauthorized (invalid API key)
  • 402 - Payment Required (insufficient credits)
  • 404 - Not Found
  • 429 - Rate limit exceeded
  • 500 - Internal server error

API Limits

  • Rate limits apply based on your plan
  • Crawl jobs expire after 24 hours
  • Max crawl limit: depends on credits

Links

安全使用建议
What to check before installing: - Confirm the API key requirement: this skill expects ANYCRAWL_API_KEY (set in your environment or via openclaw config). The registry metadata failing to list it is an inconsistency you should verify with the author. - Verify provenance: source/homepage are missing in the registry entry. Check the referenced repository (package.json) and project site (https://anycrawl.dev / https://api.anycrawl.dev) to ensure they are legitimate and match the package contents. - Least privilege for the API key: only provide a key scoped to the minimum needed for your use; do not reuse high-privilege credentials. - Network/exfiltration reminder: this skill sends scraped content and requests to an external API host (api.anycrawl.dev). Treat any external-call-capable skill as able to send data off your system — avoid sending sensitive, private, or secret content through it unless you trust the service. - Audit the code yourself (index.js is short and readable): there is no obfuscation and no other env vars are accessed, but if you need higher assurance confirm the API base and repository are legitimate. - If you are uncomfortable with the provenance or the missing registry declaration of the API key, ask the author/maintainer for clarification or do not install.
功能分析
Type: OpenClaw Skill Name: anycrawl Version: 1.0.1 The OpenClaw skill integrates with the AnyCrawl API for web scraping, crawling, and searching. It securely handles the API key via environment variables and makes standard HTTP requests to `https://api.anycrawl.dev/v1`. The `SKILL.md` and `index.js` files show no evidence of prompt injection, malicious execution, data exfiltration (beyond sending the API key to the legitimate AnyCrawl service), persistence mechanisms, or obfuscation. The functionality is clearly aligned with its stated purpose.
能力评估
Purpose & Capability
Name, README, SKILL.md, and index.js all describe a web-scrape/crawl/search integration against https://api.anycrawl.dev — the declared capabilities match the implementation. However the registry metadata lists no required env vars/credentials even though the code and SKILL.md require ANYCRAWL_API_KEY; also the package has no published homepage and the registry 'source' is unknown, reducing provenance.
Instruction Scope
SKILL.md instructs how to set an API key and how to call the provided functions; it does not instruct the agent to read unrelated local files or exfiltrate arbitrary system data. Example usages and parameters closely match the code. It does, however, recommend persisting the API key in shell rc files (common but worth noting).
Install Mechanism
There is no install spec (instruction-only in registry), which is low-risk. The package contains index.js and package.json (no dependencies). No external downloads or extract steps are present.
Credentials
The skill requires an API key (ANYCRAWL_API_KEY) as enforced by index.js and documented in SKILL.md, but the registry metadata omitted required env vars/primary credential. That mismatch is confusing and could lead users to miss that a secret is required. No other unrelated credentials are requested.
Persistence & Privilege
always:false and normal autonomous invocation are set (platform defaults). The skill does not request persistent system-wide changes or access to other skills' configs. It does suggest adding the API key to ~/.bashrc or using openclaw config.patch (both standard setup actions).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install anycrawl
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /anycrawl 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
anycrawl 1.0.1 - Added `.env.example` file to make API key setup easier for users. - Added `README.md` (contents not shown). - Updated SKILL.md with detailed `.env` setup instructions, including multiple ways to set the API key. - No changes to function parameters or API usage.
v1.0.0
Initial release of AnyCrawl skill. - Integrates AnyCrawl API for high-performance multi-threaded web scraping, crawling, and search. - Supports scraping single URLs, Google search, full-website crawling, crawl status/results retrieval, and canceling crawl jobs. - Allows structured data extraction (markdown, HTML, text, JSON, screenshots) and advanced scrape/search options. - Includes Google search + scraping helper for quick info retrieval. - Provides robust engine selection, response format, and error handling guidance.
元数据
Slug anycrawl
版本 1.0.1
许可证
累计安装 10
当前安装数 10
历史版本数 2
常见问题

AnyCrawl-API 是什么?

Perform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 2945 次。

如何安装 AnyCrawl-API?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install anycrawl」即可一键安装,无需额外配置。

AnyCrawl-API 是免费的吗?

是的,AnyCrawl-API 完全免费(开源免费),可自由下载、安装和使用。

AnyCrawl-API 支持哪些平台?

AnyCrawl-API 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 AnyCrawl-API?

由 techlaai(@techlaai)开发并维护,当前版本 v1.0.1。

💬 留言讨论