← Back to Skills Marketplace
Web Scraper
by
jpengcheng523-netizen
· GitHub ↗
· v1.0.0
· MIT-0
200
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install jpeng-web-scraper
Description
Web scraping skill with JavaScript rendering support. Extract data from websites using CSS selectors, XPath, or AI-powered extraction.
README (SKILL.md)
Web Scraper
Extract data from websites with support for dynamic content.
When to Use
- User wants to scrape data from a website
- Extract structured data from HTML
- Handle JavaScript-rendered pages
- Crawl multiple pages
Features
- Static pages: Fast HTML parsing
- Dynamic pages: Playwright/Puppeteer rendering
- Selectors: CSS, XPath, regex
- AI extraction: Auto-detect data patterns
Usage
Simple scrape
python3 scripts/scrape.py \
--url "https://example.com/products" \
--selector ".product-name" \
--output ./products.json
With JavaScript rendering
python3 scripts/scrape.py \
--url "https://spa-example.com/data" \
--render \
--wait 2000 \
--selector ".data-item"
Extract multiple fields
python3 scripts/scrape.py \
--url "https://example.com/listings" \
--fields '{
"title": "h1.title",
"price": ".price",
"description": ".desc"
}'
Crawl multiple pages
python3 scripts/scrape.py \
--url "https://example.com/page/1" \
--crawl 'a[href*="/page/"]' \
--max-pages 10 \
--selector ".item"
AI-powered extraction
python3 scripts/scrape.py \
--url "https://example.com/article" \
--ai-extract "Extract the title, author, and publication date"
Output
{
"success": true,
"url": "https://example.com/products",
"items": [
{"name": "Product 1", "price": "$99"},
{"name": "Product 2", "price": "$149"}
],
"scraped_at": "2024-01-15T10:30:00Z"
}
Rate Limiting
- Default delay: 1 second between requests
- Respects robots.txt
- Customizable user agent
Usage Guidance
This skill is incomplete and ambiguous: it documents commands that run scripts/scrape.py and references Playwright/Puppeteer, but the package contains no code, no install instructions, and no trusted source URL. Before installing or enabling it: 1) ask the publisher for the source code or a real homepage/README and a dependency list (Python version, required pip packages or npm packages, Playwright/browser binaries); 2) require an explicit install spec or packaged binary from a trusted host (GitHub release, PyPI, npm) — do not allow the agent to fetch arbitrary URLs to satisfy missing deps; 3) verify the scripts/scrape.py file and inspect it for data exfiltration, credential access, or remote callbacks; 4) run the tool in a sandboxed environment first and avoid providing any site credentials until you confirm necessity; 5) consider legal/ethical constraints of scraping target sites and ensure the tool honors robots.txt and rate limits. Given the unknown source and the mismatch between claimed capabilities and the bundle contents, treat this skill as unready for production.
Capability Analysis
Type: OpenClaw Skill
Name: jpeng-web-scraper
Version: 1.0.0
The skill bundle documentation (SKILL.md) and metadata (_meta.json) describe a standard web scraping tool with support for dynamic content rendering and AI-powered extraction. There is no evidence of malicious intent, prompt injection, or unauthorized data access in the provided files, and the described functionality aligns with the tool's stated purpose.
Capability Assessment
Purpose & Capability
The skill claims JavaScript rendering support (Playwright/Puppeteer) and crawling features, but declares no required binaries, no environment variables, and provides no code or install spec. A scraping tool that needs browser automation would normally list Node/Python packages, a browser driver, or an install step; those are missing, which is disproportionate and incoherent.
Instruction Scope
SKILL.md instructs the agent to run 'python3 scripts/scrape.py' with various flags (rendering, crawling, AI extraction). There is no scripts/scrape.py in the bundle. The instructions therefore point to executing local code that doesn't exist. The doc also implies use of heavy runtime components (Playwright/Puppeteer) but gives no guidance on installing or sandboxing them.
Install Mechanism
There is no install spec. Given the stated features (JS rendering, Playwright/Puppeteer), an installation step is expected (pip/npm installs, browser binaries). The absence of an install mechanism leaves ambiguity about where the code would come from and how dependencies would be provisioned — increasing risk if an agent tries to fetch/install packages at runtime.
Credentials
The skill declares no environment variables or credentials, which is consistent with a simple, local scraper. However, it also omits declaring expected system binaries or package requirements (python, node, playwright browsers). If the scraper needs authentication for target sites, those credentials aren't declared. The lack of declared dependencies is the main proportionality issue.
Persistence & Privilege
always is false and there are no claims of modifying other skills or agent-wide config. Autonomous invocation is allowed (default) but that alone is not a red flag.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install jpeng-web-scraper - After installation, invoke the skill by name or use
/jpeng-web-scraper - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of jpeng-web-scraper.
- Supports web scraping for static and JavaScript-rendered pages.
- Flexible data extraction using CSS selectors, XPath, or regex.
- Includes AI-powered extraction for structured information.
- Allows crawling multiple pages with rate limiting and robots.txt respect.
- Provides simple command-line usage examples for various scenarios.
Metadata
Frequently Asked Questions
What is Web Scraper?
Web scraping skill with JavaScript rendering support. Extract data from websites using CSS selectors, XPath, or AI-powered extraction. It is an AI Agent Skill for Claude Code / OpenClaw, with 200 downloads so far.
How do I install Web Scraper?
Run "/install jpeng-web-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Web Scraper free?
Yes, Web Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Web Scraper support?
Web Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Web Scraper?
It is built and maintained by jpengcheng523-netizen (@jpengcheng523-netizen); the current version is v1.0.0.
More Skills