Data Scraper
/install data-scraper
data-scraper
Web Data Scraper — Extract structured data from web pages using curl + parsing. Lightweight, no browser required. Supports HTML-to-text, table extraction, price monitoring, and batch scraping.
When to Use
- Extract text content from web pages (articles, blogs, docs)
- Scrape product prices, reviews, or listings
- Monitor pages for changes (price drops, new content)
- Batch-collect data from multiple URLs
- Convert HTML tables to structured formats (JSON/CSV)
Quick Start
# Extract readable text from URL
data-scraper fetch "https://example.com/article"
# Extract specific elements
data-scraper extract "https://example.com" --selector "h2, .price"
# Monitor for changes
data-scraper watch "https://example.com/product" --interval 3600
Extraction Modes
Text Mode (default)
Fetches page and extracts readable content, stripping HTML tags, scripts, and styles. Similar to reader mode.
data-scraper fetch URL
# Output: clean markdown text
Selector Mode
Target specific CSS selectors for precise extraction.
data-scraper extract URL --selector ".product-title, .price, .rating"
# Output: matched elements as structured data
Table Mode
Extract HTML tables into structured formats.
data-scraper table URL --index 0
# Output: JSON array of row objects (header → value mapping)
Link Mode
Extract all links from a page with optional filtering.
data-scraper links URL --filter "*.pdf"
# Output: filtered list of absolute URLs
Batch Scraping
# Scrape multiple URLs
data-scraper batch urls.txt --output results/
# With rate limiting
data-scraper batch urls.txt --delay 2000 --output results/
urls.txt format:
https://site1.com/page
https://site2.com/page
https://site3.com/page
Change Monitoring
# Watch for changes, alert on diff
data-scraper watch URL --selector ".price" --interval 3600
# Compare with previous snapshot
data-scraper diff URL
Stores snapshots in data-scraper/snapshots/ with timestamps. Alerts via notification-hub when changes detected.
Output Formats
| Format | Flag | Use Case |
|---|---|---|
| Text | --format text |
Reading, summarization |
| JSON | --format json |
Data processing |
| CSV | --format csv |
Spreadsheets |
| Markdown | --format md |
Documentation |
Headers & Auth
# Custom headers
data-scraper fetch URL --header "Authorization: Bearer TOKEN"
# Cookie-based auth
data-scraper fetch URL --cookie "session=abc123"
# User-Agent override
data-scraper fetch URL --ua "Mozilla/5.0..."
Rate Limiting & Ethics
- Default: 1 request per second per domain
- Respects
robots.txtwhen--politeflag is set - Configurable delay between requests
- Stops on 429 (Too Many Requests) and backs off
Error Handling
| Error | Behavior |
|---|---|
| 404 | Log and skip |
| 403/401 | Warn about auth requirement |
| 429 | Exponential backoff (max 3 retries) |
| Timeout | Retry once with longer timeout |
| SSL error | Warn, option to proceed with --insecure |
Integration
- web-claude: Use as fallback when web_fetch isn't enough
- competitor-watch: Feed scraped data into competitor analysis
- seo-audit: Scrape competitor pages for SEO comparison
- performance-tracker: Collect social metrics from public profiles
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install data-scraper - 安装完成后,直接呼叫该 Skill 的名称或使用
/data-scraper触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Data Scraper 是什么?
Web page data collection and structured text extraction. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1427 次。
如何安装 Data Scraper?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-scraper」即可一键安装,无需额外配置。
Data Scraper 是免费的吗?
是的,Data Scraper 完全免费(开源免费),可自由下载、安装和使用。
Data Scraper 支持哪些平台?
Data Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Data Scraper?
由 mupengi-bot(@mupengi-bot)开发并维护,当前版本 v1.0.0。