← 返回 Skills 市场

智能网页爬虫

Name: 智能网页爬虫
Author: cjstate

作者 CJstate · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

109

总下载

当前安装

版本数

在 OpenClaw 中安装

/install xh-smart-scraper

功能描述

智能网页数据采集器。自动识别网页结构，批量抓取列表/表格/详情页数据，支持导出JSON/CSV/Excel。内置反爬策略适配。

安全使用建议

This skill contains plausible scraper code (Puppeteer + Cheerio) and will npm install Puppeteer (which downloads Chromium). However the README/metadata overstate capabilities — proxy pools, retries, database writes and randomized anti-bot strategies are advertised but not implemented. Before installing or using: (1) review scraper.js yourself or run it in a sandboxed environment; (2) avoid running npm install as root because Puppeteer/Chromium can require special flags (--no-sandbox is used in the code); (3) if you need proxy or DB features, expect to modify the code and add secure credential handling; (4) heed legal/robots.txt constraints for scraping targets. If you want a fully-featured scraper, request clarification or a version that actually implements the advertised features and documents how credentials/config are provided.

功能分析

Type: OpenClaw Skill Name: xh-smart-scraper Version: 1.0.0 The skill is a standard web scraper implementation using Puppeteer and Cheerio. The code in scraper.js performs legitimate data extraction and file export (JSON/CSV/Excel) based on user-provided configurations, with no evidence of data exfiltration, malicious execution, or prompt injection in SKILL.md.

能力评估

⚠ Purpose & Capability

Name/description promise: auto-recognition, anti-bot adaptations, proxy pool support, automatic retries, and database direct storage. The code implements basic Puppeteer fetching, Cheerio parsing, simple file export, and a static random User-Agent list. It does NOT implement proxy pool usage, DB storage, retry logic, or true randomized delays despite these appearing in the documentation—this is a mismatch between stated purpose and actual capability.

⚠ Instruction Scope

SKILL.md instructs npm install and running scraper.js (consistent). However the documentation advertises features (IP proxy pool, DB direct store, configurable randomized delays/retries) that the runtime instructions/code do not actually support. The runtime code reads a local config file and writes outputs to local files (JSON/CSV/Excel) only — it does not access external endpoints other than the target URLs, nor does it read environment variables or other system config.

ℹ Install Mechanism

No explicit install spec in registry (instruction-only), but package.json depends on puppeteer (which will download Chromium during npm install). This is expected for a scraper but increases install size and can pull large binaries. No external, untrusted download URLs; standard npm dependencies are used.

⚠ Credentials

Requires no environment variables or credentials in metadata, which matches the code. However the documentation claims proxy pool and DB direct-storage features that typically require credentials/config; those are not requested or implemented—this mismatch can mislead users about what secrets/config are needed and may result in attempts to add credentials later without clear handling in the code.

✓ Persistence & Privilege

Does not request persistent/always-on privilege. It is user-invocable and not set to always: true. The skill only runs when invoked and writes output files to disk, which is expected behavior for a CLI scraper.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install xh-smart-scraper
安装完成后，直接呼叫该 Skill 的名称或使用 /xh-smart-scraper 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of Smart Web Scraper (智能网页数据采集器) - Features intelligent structure recognition for list, table, and detail pages - Automatically extracts key fields such as titles, prices, and authors - Supports anti-crawling strategies: User-Agent rotation, request delay, proxy pool (optional), and auto-retry - Exports data in JSON, CSV, Excel, and supports direct database storage (MySQL/MongoDB) - Provides command-line and config file usage with sample scenarios

元数据

Slug xh-smart-scraper

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

智能网页爬虫是什么？

智能网页数据采集器。自动识别网页结构，批量抓取列表/表格/详情页数据，支持导出JSON/CSV/Excel。内置反爬策略适配。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 109 次。

如何安装智能网页爬虫？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install xh-smart-scraper」即可一键安装，无需额外配置。

智能网页爬虫是免费的吗？

是的，智能网页爬虫完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

智能网页爬虫支持哪些平台？

智能网页爬虫跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了智能网页爬虫？

由 CJstate（@cjstate）开发并维护，当前版本 v1.0.0。

智能网页爬虫

智能网页爬虫 是什么？

如何安装 智能网页爬虫？

智能网页爬虫 是免费的吗？

智能网页爬虫 支持哪些平台？

谁开发了 智能网页爬虫？

💬 留言讨论

智能网页爬虫是什么？

如何安装智能网页爬虫？

智能网页爬虫是免费的吗？

智能网页爬虫支持哪些平台？

谁开发了智能网页爬虫？