← 返回 Skills 市场
cjstate

智能网页爬虫

作者 CJstate · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
109
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install xh-smart-scraper
功能描述
智能网页数据采集器。自动识别网页结构,批量抓取列表/表格/详情页数据,支持导出JSON/CSV/Excel。内置反爬策略适配。
安全使用建议
This skill contains plausible scraper code (Puppeteer + Cheerio) and will npm install Puppeteer (which downloads Chromium). However the README/metadata overstate capabilities — proxy pools, retries, database writes and randomized anti-bot strategies are advertised but not implemented. Before installing or using: (1) review scraper.js yourself or run it in a sandboxed environment; (2) avoid running npm install as root because Puppeteer/Chromium can require special flags (--no-sandbox is used in the code); (3) if you need proxy or DB features, expect to modify the code and add secure credential handling; (4) heed legal/robots.txt constraints for scraping targets. If you want a fully-featured scraper, request clarification or a version that actually implements the advertised features and documents how credentials/config are provided.
功能分析
Type: OpenClaw Skill Name: xh-smart-scraper Version: 1.0.0 The skill is a standard web scraper implementation using Puppeteer and Cheerio. The code in scraper.js performs legitimate data extraction and file export (JSON/CSV/Excel) based on user-provided configurations, with no evidence of data exfiltration, malicious execution, or prompt injection in SKILL.md.
能力评估
Purpose & Capability
Name/description promise: auto-recognition, anti-bot adaptations, proxy pool support, automatic retries, and database direct storage. The code implements basic Puppeteer fetching, Cheerio parsing, simple file export, and a static random User-Agent list. It does NOT implement proxy pool usage, DB storage, retry logic, or true randomized delays despite these appearing in the documentation—this is a mismatch between stated purpose and actual capability.
Instruction Scope
SKILL.md instructs npm install and running scraper.js (consistent). However the documentation advertises features (IP proxy pool, DB direct store, configurable randomized delays/retries) that the runtime instructions/code do not actually support. The runtime code reads a local config file and writes outputs to local files (JSON/CSV/Excel) only — it does not access external endpoints other than the target URLs, nor does it read environment variables or other system config.
Install Mechanism
No explicit install spec in registry (instruction-only), but package.json depends on puppeteer (which will download Chromium during npm install). This is expected for a scraper but increases install size and can pull large binaries. No external, untrusted download URLs; standard npm dependencies are used.
Credentials
Requires no environment variables or credentials in metadata, which matches the code. However the documentation claims proxy pool and DB direct-storage features that typically require credentials/config; those are not requested or implemented—this mismatch can mislead users about what secrets/config are needed and may result in attempts to add credentials later without clear handling in the code.
Persistence & Privilege
Does not request persistent/always-on privilege. It is user-invocable and not set to always: true. The skill only runs when invoked and writes output files to disk, which is expected behavior for a CLI scraper.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install xh-smart-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /xh-smart-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of Smart Web Scraper (智能网页数据采集器) - Features intelligent structure recognition for list, table, and detail pages - Automatically extracts key fields such as titles, prices, and authors - Supports anti-crawling strategies: User-Agent rotation, request delay, proxy pool (optional), and auto-retry - Exports data in JSON, CSV, Excel, and supports direct database storage (MySQL/MongoDB) - Provides command-line and config file usage with sample scenarios
元数据
Slug xh-smart-scraper
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

智能网页爬虫 是什么?

智能网页数据采集器。自动识别网页结构,批量抓取列表/表格/详情页数据,支持导出JSON/CSV/Excel。内置反爬策略适配。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 109 次。

如何安装 智能网页爬虫?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install xh-smart-scraper」即可一键安装,无需额外配置。

智能网页爬虫 是免费的吗?

是的,智能网页爬虫 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

智能网页爬虫 支持哪些平台?

智能网页爬虫 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 智能网页爬虫?

由 CJstate(@cjstate)开发并维护,当前版本 v1.0.0。

💬 留言讨论