← Back to Skills Marketplace
jinkang19940922

Web Crawler

by 噢福阔斯KANG · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
317
Downloads
0
Stars
2
Active Installs
1
Versions
Install in OpenClaw
/install web-crawler
Description
网页爬虫工具,支持静态和动态页面爬取、媒体下载、反爬虫规避。激活条件:用户提到爬虫、爬取、crawler、scraper、抓取网页、下载媒体
Usage Guidance
Do not enable or run this skill until the author/source is verified and missing pieces are resolved. Ask for: (1) source code (the referenced ./src/index.js and package.json), (2) an install spec or explicit list of runtime dependencies (Node version, Puppeteer, Chrome) and how Puppeteer/Chrome will be provided, (3) explanation and justification for the hardcoded proxies (why those IPs, who controls them), and (4) where outputs are stored, retention policy, and limits on media downloads. If you must test it, run it in an isolated VM or sandbox with restricted network access, remove/replace hardcoded proxies, and avoid granting broad autonomous invocation or access to sensitive internal networks. Because the package is instruction-only and inconsistent, proceed cautiously.
Capability Analysis
Type: OpenClaw Skill Name: web-crawler Version: 1.0.0 The skill bundle describes a web crawler with Puppeteer and proxy rotation capabilities. It is classified as suspicious due to the inclusion of hardcoded internal IP addresses (192.168.10.222) for proxy configurations in SKILL.md, which could be used for internal network pivoting. Furthermore, the actual implementation logic in src/index.js and configuration in config/default.json are missing from the provided files, preventing verification of the crawler's behavior.
Capability Assessment
Purpose & Capability
The described capability (static/dynamic crawling, media download, anti-bot) matches the SKILL.md content. However the skill requires a local Node module ('./src/index.js'), Puppeteer and a system Chrome binary, and expects config files under the workspace — none of these artifacts or required binaries are declared in the registry metadata. That mismatch suggests incomplete packaging or sloppy metadata.
Instruction Scope
The SKILL.md instructs the agent to cd into /home/node/.openclaw/workspace/web-crawler, require local code, read config/default.json, use proxy lists (including hardcoded 192.168.x.x addresses), and write scraped HTML/media/screenshots into outputs/. Those are file-system and network operations that go beyond a simple instruction: they create persistent output directories and rely on local binaries and proxies. The skill does not include safeguards or explain consent/permissions for writing or large data downloads.
Install Mechanism
There is no install spec (instruction-only), which is low-risk in general. But because the instructions expect Node/Puppeteer/Chrome and local source files, the absence of an install step is an inconsistency: a consumer would need to manually install dependencies and supply the missing code and browser, increasing the chance of misconfiguration or supply-chain risk.
Credentials
No environment variables or credentials are declared, yet the skill expects proxy configuration (antiBot.proxyList) and access to system browser executables and the workspace filesystem. Hardcoded proxies pointing at private IPs are suspicious (they may route traffic through an internal host). The skill will download media and write structured data locally, which could be used for large-scale scraping or exfiltration if misused.
Persistence & Privilege
The skill does not request always:true and does not declare changes to other skills or system-wide settings. It will create output files under its workspace, which is normal for a crawler, but that is not a platform-level persistence privilege.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install web-crawler
  3. After installation, invoke the skill by name or use /web-crawler
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Web Crawler Skill v1.0.0 - Initial release. - Supports static and dynamic webpage crawling, including JS-rendered pages. - Automatic media downloading (images, video, audio) to outputs directory. - Anti-crawling measures: user-agent rotation, request delay, proxy rotation. - Easy configuration of crawl depth, page limits, media download, and proxies. - Organized output: HTML, text, screenshots, media files, and structured data.
Metadata
Slug web-crawler
Version 1.0.0
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 1
Frequently Asked Questions

What is Web Crawler?

网页爬虫工具,支持静态和动态页面爬取、媒体下载、反爬虫规避。激活条件:用户提到爬虫、爬取、crawler、scraper、抓取网页、下载媒体. It is an AI Agent Skill for Claude Code / OpenClaw, with 317 downloads so far.

How do I install Web Crawler?

Run "/install web-crawler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Web Crawler free?

Yes, Web Crawler is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Web Crawler support?

Web Crawler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Web Crawler?

It is built and maintained by 噢福阔斯KANG (@jinkang19940922); the current version is v1.0.0.

💬 Comments