← 返回 Skills 市场
hlyylly

crawl

作者 hlyylly · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ⚠ suspicious
227
总下载
1
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install chromeopencrawl
功能描述
Crawl any JavaScript-rendered webpage through distributed real Chrome browsers. No local browser needed — perfect for headless VPS.
使用说明 (SKILL.md)

OpenCrawl Skill

Use this skill to crawl any JavaScript-rendered webpage using real Chrome browsers from a distributed worker pool. Unlike headless browser solutions (Puppeteer/Playwright), OpenCrawl requires zero local browser installation — ideal for VPS and cloud environments.

Quick Start (use our public server)

  1. Visit http://39.105.206.76:9877 and click "Register" to get a free API Key (100 credits included)
  2. Set environment variables:
    OPENCRAWL_API_KEY=ak_your_key_here
    OPENCRAWL_API_URL=http://39.105.206.76:9877
    
  3. Start crawling!

Self-hosted (deploy your own server)

If you prefer to run your own OpenCrawl server, see the full deployment guide: https://github.com/hlyylly/OpenCrawl

Then set OPENCRAWL_API_URL to your own server address.


How it works: Your request → OpenCrawl server → dispatched to a real Chrome browser worker → page rendered with full JavaScript → content extracted → uploaded to Cloudflare R2 → download URL returned to you.

Errors: On failure the script writes a JSON error to stderr and exits with code 1.


Tools

1. Crawl Page

Use this to get the full rendered text content of any webpage, including JavaScript-rendered content that simple HTTP requests cannot retrieve.

Command:

python3 {baseDir}/tools/crawl.py --url "https://example.com"

Examples:

# Crawl a full page
python3 {baseDir}/tools/crawl.py --url "https://www.smzdm.com/p/170177008/"

# Crawl with CSS selector to extract specific content
python3 {baseDir}/tools/crawl.py --url "https://example.com" --selector ".article-content"

# Output raw JSON response (includes downloadUrl)
python3 {baseDir}/tools/crawl.py --url "https://example.com" --raw

Optional flags:

  • --selector ".css-selector" — Extract only matching elements
  • --mode lite — Lite mode: no images/CSS, faster, 0.1 credit (default: full)
  • --raw — Output full JSON response instead of just the text content
  • --timeout 60 — Custom timeout in seconds (default: 60)

2. Search (Brave Search API Compatible)

Use this to search the web using multiple search engines (DuckDuckGo + Google + Bing + Baidu) through real Chrome browsers. Returns structured results compatible with Brave Search API format.

Command:

python3 {baseDir}/tools/crawl.py --search "your search query"

Examples:

# Lite search — DuckDuckGo only (0.1 credit)
python3 {baseDir}/tools/crawl.py --search "python web scraping"

# Full search — 4 engines parallel (3 credits, 20-30 deduplicated results)
python3 {baseDir}/tools/crawl.py --search "python web scraping" --mode full

4. Check Balance

Use this to check how many credits remain on the API key.

Command:

python3 {baseDir}/tools/crawl.py --balance

5. Check Status

Use this to check the OpenCrawl platform status — how many workers are online, tasks completed, etc.

Command:

python3 {baseDir}/tools/crawl.py --status

Summary

Action Argument Example
Crawl (full) --url python3 {baseDir}/tools/crawl.py --url "https://example.com"
Crawl (lite) --url --mode lite python3 {baseDir}/tools/crawl.py --url "https://example.com" --mode lite
Search (lite) --search python3 {baseDir}/tools/crawl.py --search "python tutorial"
Search (full) --search --mode full python3 {baseDir}/tools/crawl.py --search "python tutorial" --mode full
Check balance --balance python3 {baseDir}/tools/crawl.py --balance
Check status --status python3 {baseDir}/tools/crawl.py --status

Output: Crawl → rendered page text (or JSON with --raw). Search → JSON with web.results[] (Brave compatible). Balance → JSON. Status → JSON.

Requirements: Python 3.8+, requests library. No browser installation needed.

安全使用建议
This skill functions as a remote crawling client, but it relies on a public server at a raw IP and uses HTTP by default — that means: (1) any pages you ask it to crawl (including sensitive pages) will be transmitted to and stored by that third-party service (Cloudflare R2 storage is mentioned), and (2) your API key and requests may be sent in plaintext if you use the HTTP endpoint. Before installing, consider these steps: (a) do not use the public server for sensitive URLs or credentials; (b) prefer self-hosting the OpenCrawl server from the linked GitHub repo and set OPENCRAWL_API_URL to an HTTPS endpoint you control; (c) if you must use the public server, confirm it supports HTTPS and that you use an https:// URL — otherwise do not use it; (d) rotate API keys regularly and keep them scoped/limited; (e) inspect the upstream OpenCrawl server code (the GitHub repo) to verify what data is logged/stored and how workers are provisioned; and (f) if unsure, run the tool in an isolated environment and avoid crawling pages that contain secrets or personal data.
功能分析
Type: OpenClaw Skill Name: chromeopencrawl Version: 1.0.2 The skill is a standard API client for the OpenCrawl service, which provides JavaScript-rendered web crawling and multi-engine search capabilities via a remote worker pool. The core logic in `tools/crawl.py` is a straightforward wrapper that communicates with a backend server (defaulting to `39.105.206.76`) to submit tasks and retrieve results from Cloudflare R2. No evidence of data exfiltration, malicious execution, or prompt injection was found, and the code's behavior is fully aligned with the documentation in `SKILL.md` and `README.md`.
能力评估
Purpose & Capability
The skill's functionality (distributed real Chrome crawling) matches the code and instructions: the CLI posts to an OpenCrawl API and downloads results. However registry metadata at the top of the package lists no required env vars while SKILL.md and tools/crawl.py clearly require OPENCRAWL_API_KEY (and optionally OPENCRAWL_API_URL) — an internal inconsistency.
Instruction Scope
SKILL.md explicitly instructs users to register at a raw IP (http://39.105.206.76:9877) and use that public server. The runtime instructions and code send the requested URL to the remote service, which renders pages in external worker browsers and stores results on Cloudflare R2; therefore any URL/content you crawl is transmitted and stored by a third party. The code also blindly fetches a downloadUrl returned by the service without additional validation.
Install Mechanism
No install spec (instruction-only skill) and only a minimal Python dependency (requests). Nothing is downloaded or executed at install time from untrusted URLs.
Credentials
Requesting an API key (OPENCRAWL_API_KEY) is proportionate to a remote crawling service. But SKILL.md/code default to an unencrypted HTTP API endpoint at a raw IP, meaning the Authorization header (the API key) and payloads would be sent in plaintext if the user follows the quick-start. Also the registry metadata omitted the required env var, which is a discrepant/informational red flag.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or system-wide settings, and does not require persistent elevated privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install chromeopencrawl
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /chromeopencrawl 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
- Added support for a new web search feature using multiple search engines with Brave Search API-compatible results. - Introduced "lite" and "full" modes for both crawling and searching, allowing faster, lower-cost lite operations. - Updated summary and usage examples to reflect new search and mode options. - No code changes detected; changes are documentation updates reflecting new features and options.
v1.0.1
- Added Quick Start instructions for using the public OpenCrawl server, including registration and API key setup. - Included steps for self-hosted deployment and custom server configuration. - No changes to core features or usage commands; documentation improvements only. - Useful connection details, such as example server URL and environment variables, are now provided upfront for easier onboarding.
v1.0.0
Initial release — crawl JavaScript-rendered webpages remotely with real Chrome browsers. - Introduced tools to crawl any webpage (incl. dynamic JS content) with a single Python command via distributed Chrome workers. - No local browser/headless setup required — ideal for cloud/server use. - Supports content extraction with CSS selector, raw JSON output, and custom timeout. - Includes balance and platform status check commands. - Requires only Python 3.8+ and `requests`, plus OPENCRAWL_API_KEY.
元数据
Slug chromeopencrawl
版本 1.0.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

crawl 是什么?

Crawl any JavaScript-rendered webpage through distributed real Chrome browsers. No local browser needed — perfect for headless VPS. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 227 次。

如何安装 crawl?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install chromeopencrawl」即可一键安装,无需额外配置。

crawl 是免费的吗?

是的,crawl 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

crawl 支持哪些平台?

crawl 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 crawl?

由 hlyylly(@hlyylly)开发并维护,当前版本 v1.0.2。

💬 留言讨论