← Back to Skills Marketplace
chungvic

E-commerce Data Scraper Pro

by chungvic · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
108
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install ecommerce-data-scraper-pro
Description
智能数据抓取工具 - 从网页/API 提取结构化数据,支持批量处理
README (SKILL.md)

Data Scraper - 智能数据抓取工具

从网页、API 自动提取结构化数据,支持批量处理和多种输出格式。

功能特性

  • 🕷️ 网页数据抓取 - 自动识别并提取目标数据
  • 📊 结构化输出 - JSON、CSV、Excel 格式
  • 🔄 批量处理 - 支持多页面/多 URL 批量抓取
  • 🛡️ 反爬规避 - 智能请求频率控制
  • 🔌 API 集成 - 支持 REST/GraphQL API
  • 📝 数据清洗 - 自动去重、格式化

使用方法

基础用法

# 抓取单个网页
uv run scripts/data-scraper.py scrape --url "https://example.com/products" --selector ".product"

# 抓取多个页面
uv run scripts/data-scraper.py scrape --urls-file urls.txt --output data.json

# 从 API 获取数据
uv run scripts/data-scraper.py api --endpoint "https://api.example.com/data" --auth "Bearer TOKEN"

高级选项

# 指定输出格式
uv run scripts/data-scraper.py scrape --url "https://example.com" --format csv --output products.csv

# 设置请求延迟(避免被封)
uv run scripts/data-scraper.py scrape --url "https://example.com" --delay 2

# 使用代理
uv run scripts/data-scraper.py scrape --url "https://example.com" --proxy "http://proxy:port"

# 定时抓取
uv run scripts/data-scraper.py scrape --url "https://example.com" --schedule "0 */6 * * *"

支持的数据类型

类型 描述 示例
product 电商产品 价格、名称、评分、库存
article 新闻/博客 标题、作者、日期、内容
job 招聘信息 职位、公司、薪资、要求
real_estate 房产信息 价格、面积、位置、户型
social 社交媒体 帖子、评论、点赞数
custom 自定义 通过 CSS/XPath 选择器定义

输出格式

JSON(默认)

{
  "url": "https://example.com",
  "scrapedAt": "2026-02-28T01:13:00Z",
  "data": [
    {
      "title": "产品标题",
      "price": "$99.99",
      "rating": 4.5
    }
  ]
}

CSV

title,price,rating,url
产品标题,$99.99,4.5,https://...

Excel

  • 多工作表支持
  • 自动格式化
  • 数据透视表

定价建议

版本 功能 价格
基础版 单次抓取,100 页/月 $49
专业版 批量抓取,1000 页/月,定时任务 $149
企业版 无限抓取,API 访问,定制支持 $499

示例

电商产品价格监控

输入:

uv run scripts/data-scraper.py scrape \
  --url "https://amazon.com/s?k=wireless+headphones" \
  --type product \
  --fields "title,price,rating,reviews" \
  --output headphones.json

输出:

{
  "scrapedAt": "2026-02-28T01:13:00Z",
  "count": 50,
  "data": [
    {
      "title": "Sony WH-1000XM5",
      "price": "$349.99",
      "rating": 4.7,
      "reviews": 12453
    }
  ]
}

招聘信息抓取

输入:

uv run scripts/data-scraper.py scrape \
  --url "https://linkedin.com/jobs/search?keywords=python+developer" \
  --type job \
  --fields "title,company,location,salary" \
  --output jobs.csv

技术实现

  • 使用 Playwright/BeautifulSoup 进行网页解析
  • 支持 JavaScript 渲染页面
  • 智能重试和错误处理
  • 可集成到 OpenClaw 工作流

注意事项

⚠️ 合法合规使用

  • 遵守目标网站 robots.txt
  • 不要过度请求导致服务器压力
  • 尊重数据版权和隐私
  • 仅抓取公开数据

更新日志

v0.1.0 (2026-02-28)

  • 初始版本发布
  • 支持基础网页抓取
  • 支持 JSON/CSV 输出
  • 支持批量处理

待开发功能

  • 图形化配置界面
  • 数据可视化
  • 自动字段识别
  • 云存储集成
  • 实时监控告警

开发者: VIC ai-company
许可: MIT
支持: 联系 main agent

Usage Guidance
This skill appears to be what it says: a local Python-based web/API scraper. Before installing or running it: 1) Inspect the script yourself (it’s included) and run it in a sandboxed environment first. 2) Install dependencies from trusted package indexes (pip) and prefer a virtualenv. 3) Avoid passing long-lived secrets on the command line (use environment vars or files with restricted permissions) because shell history can leak tokens. 4) Be mindful of legality and target site terms of service when scraping (robots.txt, rate limits, TOS). 5) Note small metadata inconsistencies (version/homepage) — ask the publisher for clarification if you need provenance before using in production.
Capability Analysis
Type: OpenClaw Skill Name: ecommerce-data-scraper-pro Version: 1.0.0 The skill bundle provides a legitimate data scraping utility for extracting structured data from websites and APIs. The core logic in `scripts/data-scraper.py` uses standard libraries (requests, BeautifulSoup) to perform its stated functions, including support for batch processing, API authentication, and local file output. No evidence of data exfiltration, malicious execution, or prompt injection was found in the code or documentation.
Capability Assessment
Purpose & Capability
The name/description (web/API data scraper) match the included Python script, README, and SKILL.md instructions. Required capabilities (HTTP requests, HTML parsing, optional Excel/pandas support) align with the declared requirements.txt. Minor metadata inconsistencies exist (registry lists version 1.0.0 while SKILL.md/_meta.json indicate 0.1.0 and SKILL metadata/homepage fields differ), but this appears to be a publishing/versioning mismatch rather than a capability mismatch.
Instruction Scope
Runtime instructions and examples only tell the agent to run the included script, provide URLs or an URLs-file, and optionally supply auth/proxy/delay/schedule. The SKILL.md does not instruct reading system files beyond the specified URLs file or sending scraped data to unexpected external endpoints. It does accept runtime auth tokens and proxy URIs (expected for API access and proxied scraping).
Install Mechanism
No install spec is provided (instruction-only), and included requirements.txt lists typical Python libraries (requests, beautifulsoup4, pandas, openpyxl). Nothing is downloaded from arbitrary URLs or installed from untrusted hosts. The user will need to pip-install dependencies themselves.
Credentials
The skill does not request environment variables or credentials in its metadata. It accepts auth and proxy values at runtime via command-line arguments (reasonable for a scraper). No unrelated secrets are requested. Note: passing sensitive tokens on the command line can expose them via shell history — a usage risk, not an inherent incoherence.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It contains no install-time hooks that modify other skills or global agent configuration. Autonomous invocation remains allowed (platform default) but is not combined with other red flags.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ecommerce-data-scraper-pro
  3. After installation, invoke the skill by name or use /ecommerce-data-scraper-pro
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of ecommerce-data-scraper-pro. - 提供网页/API 数据抓取,自动识别并结构化输出 JSON、CSV、Excel - 支持批量处理、多页面抓取和请求延迟设置 - 集成反爬虫机制(请求频率控制、代理支持) - 可自定义抓取字段及数据类型(产品、文章、招聘、房产等) - 提供 API 抓取与数据清洗功能 - 多输出格式和基础定时任务支持
Metadata
Slug ecommerce-data-scraper-pro
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is E-commerce Data Scraper Pro?

智能数据抓取工具 - 从网页/API 提取结构化数据,支持批量处理. It is an AI Agent Skill for Claude Code / OpenClaw, with 108 downloads so far.

How do I install E-commerce Data Scraper Pro?

Run "/install ecommerce-data-scraper-pro" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is E-commerce Data Scraper Pro free?

Yes, E-commerce Data Scraper Pro is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does E-commerce Data Scraper Pro support?

E-commerce Data Scraper Pro is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created E-commerce Data Scraper Pro?

It is built and maintained by chungvic (@chungvic); the current version is v1.0.0.

💬 Comments