Data Scraper
/install data-scraper
data-scraper
Web Data Scraper — Extract structured data from web pages using curl + parsing. Lightweight, no browser required. Supports HTML-to-text, table extraction, price monitoring, and batch scraping.
When to Use
- Extract text content from web pages (articles, blogs, docs)
- Scrape product prices, reviews, or listings
- Monitor pages for changes (price drops, new content)
- Batch-collect data from multiple URLs
- Convert HTML tables to structured formats (JSON/CSV)
Quick Start
# Extract readable text from URL
data-scraper fetch "https://example.com/article"
# Extract specific elements
data-scraper extract "https://example.com" --selector "h2, .price"
# Monitor for changes
data-scraper watch "https://example.com/product" --interval 3600
Extraction Modes
Text Mode (default)
Fetches page and extracts readable content, stripping HTML tags, scripts, and styles. Similar to reader mode.
data-scraper fetch URL
# Output: clean markdown text
Selector Mode
Target specific CSS selectors for precise extraction.
data-scraper extract URL --selector ".product-title, .price, .rating"
# Output: matched elements as structured data
Table Mode
Extract HTML tables into structured formats.
data-scraper table URL --index 0
# Output: JSON array of row objects (header → value mapping)
Link Mode
Extract all links from a page with optional filtering.
data-scraper links URL --filter "*.pdf"
# Output: filtered list of absolute URLs
Batch Scraping
# Scrape multiple URLs
data-scraper batch urls.txt --output results/
# With rate limiting
data-scraper batch urls.txt --delay 2000 --output results/
urls.txt format:
https://site1.com/page
https://site2.com/page
https://site3.com/page
Change Monitoring
# Watch for changes, alert on diff
data-scraper watch URL --selector ".price" --interval 3600
# Compare with previous snapshot
data-scraper diff URL
Stores snapshots in data-scraper/snapshots/ with timestamps. Alerts via notification-hub when changes detected.
Output Formats
| Format | Flag | Use Case |
|---|---|---|
| Text | --format text |
Reading, summarization |
| JSON | --format json |
Data processing |
| CSV | --format csv |
Spreadsheets |
| Markdown | --format md |
Documentation |
Headers & Auth
# Custom headers
data-scraper fetch URL --header "Authorization: Bearer TOKEN"
# Cookie-based auth
data-scraper fetch URL --cookie "session=abc123"
# User-Agent override
data-scraper fetch URL --ua "Mozilla/5.0..."
Rate Limiting & Ethics
- Default: 1 request per second per domain
- Respects
robots.txtwhen--politeflag is set - Configurable delay between requests
- Stops on 429 (Too Many Requests) and backs off
Error Handling
| Error | Behavior |
|---|---|
| 404 | Log and skip |
| 403/401 | Warn about auth requirement |
| 429 | Exponential backoff (max 3 retries) |
| Timeout | Retry once with longer timeout |
| SSL error | Warn, option to proceed with --insecure |
Integration
- web-claude: Use as fallback when web_fetch isn't enough
- competitor-watch: Feed scraped data into competitor analysis
- seo-audit: Scrape competitor pages for SEO comparison
- performance-tracker: Collect social metrics from public profiles
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install data-scraper - After installation, invoke the skill by name or use
/data-scraper - Provide required inputs per the skill's parameter spec and get structured output
What is Data Scraper?
Web page data collection and structured text extraction. It is an AI Agent Skill for Claude Code / OpenClaw, with 1427 downloads so far.
How do I install Data Scraper?
Run "/install data-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Data Scraper free?
Yes, Data Scraper is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Data Scraper support?
Data Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Data Scraper?
It is built and maintained by mupengi-bot (@mupengi-bot); the current version is v1.0.0.