← Back to Skills Marketplace

Web Scraper

Name: Web Scraper
Author: jpengcheng523-netizen

by jpengcheng523-netizen · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

200

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install jpeng-web-scraper

Description

Web scraping skill with JavaScript rendering support. Extract data from websites using CSS selectors, XPath, or AI-powered extraction.

README (SKILL.md)

Web Scraper

Extract data from websites with support for dynamic content.

When to Use

User wants to scrape data from a website
Extract structured data from HTML
Handle JavaScript-rendered pages
Crawl multiple pages

Features

Static pages: Fast HTML parsing
Dynamic pages: Playwright/Puppeteer rendering
Selectors: CSS, XPath, regex
AI extraction: Auto-detect data patterns

Usage

Simple scrape

python3 scripts/scrape.py \
  --url "https://example.com/products" \
  --selector ".product-name" \
  --output ./products.json

With JavaScript rendering

python3 scripts/scrape.py \
  --url "https://spa-example.com/data" \
  --render \
  --wait 2000 \
  --selector ".data-item"

Extract multiple fields

python3 scripts/scrape.py \
  --url "https://example.com/listings" \
  --fields '{
    "title": "h1.title",
    "price": ".price",
    "description": ".desc"
  }'

Crawl multiple pages

python3 scripts/scrape.py \
  --url "https://example.com/page/1" \
  --crawl 'a[href*="/page/"]' \
  --max-pages 10 \
  --selector ".item"

AI-powered extraction

python3 scripts/scrape.py \
  --url "https://example.com/article" \
  --ai-extract "Extract the title, author, and publication date"

Output

{
  "success": true,
  "url": "https://example.com/products",
  "items": [
    {"name": "Product 1", "price": "$99"},
    {"name": "Product 2", "price": "$149"}
  ],
  "scraped_at": "2024-01-15T10:30:00Z"
}

Rate Limiting

Default delay: 1 second between requests
Respects robots.txt
Customizable user agent

Usage Guidance

This skill is incomplete and ambiguous: it documents commands that run scripts/scrape.py and references Playwright/Puppeteer, but the package contains no code, no install instructions, and no trusted source URL. Before installing or enabling it: 1) ask the publisher for the source code or a real homepage/README and a dependency list (Python version, required pip packages or npm packages, Playwright/browser binaries); 2) require an explicit install spec or packaged binary from a trusted host (GitHub release, PyPI, npm) — do not allow the agent to fetch arbitrary URLs to satisfy missing deps; 3) verify the scripts/scrape.py file and inspect it for data exfiltration, credential access, or remote callbacks; 4) run the tool in a sandboxed environment first and avoid providing any site credentials until you confirm necessity; 5) consider legal/ethical constraints of scraping target sites and ensure the tool honors robots.txt and rate limits. Given the unknown source and the mismatch between claimed capabilities and the bundle contents, treat this skill as unready for production.

Capability Analysis

Type: OpenClaw Skill Name: jpeng-web-scraper Version: 1.0.0 The skill bundle documentation (SKILL.md) and metadata (_meta.json) describe a standard web scraping tool with support for dynamic content rendering and AI-powered extraction. There is no evidence of malicious intent, prompt injection, or unauthorized data access in the provided files, and the described functionality aligns with the tool's stated purpose.

Capability Assessment

⚠ Purpose & Capability

The skill claims JavaScript rendering support (Playwright/Puppeteer) and crawling features, but declares no required binaries, no environment variables, and provides no code or install spec. A scraping tool that needs browser automation would normally list Node/Python packages, a browser driver, or an install step; those are missing, which is disproportionate and incoherent.

⚠ Instruction Scope

SKILL.md instructs the agent to run 'python3 scripts/scrape.py' with various flags (rendering, crawling, AI extraction). There is no scripts/scrape.py in the bundle. The instructions therefore point to executing local code that doesn't exist. The doc also implies use of heavy runtime components (Playwright/Puppeteer) but gives no guidance on installing or sandboxing them.

⚠ Install Mechanism

There is no install spec. Given the stated features (JS rendering, Playwright/Puppeteer), an installation step is expected (pip/npm installs, browser binaries). The absence of an install mechanism leaves ambiguity about where the code would come from and how dependencies would be provisioned — increasing risk if an agent tries to fetch/install packages at runtime.

ℹ Credentials

The skill declares no environment variables or credentials, which is consistent with a simple, local scraper. However, it also omits declaring expected system binaries or package requirements (python, node, playwright browsers). If the scraper needs authentication for target sites, those credentials aren't declared. The lack of declared dependencies is the main proportionality issue.

✓ Persistence & Privilege

always is false and there are no claims of modifying other skills or agent-wide config. Autonomous invocation is allowed (default) but that alone is not a red flag.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install jpeng-web-scraper
After installation, invoke the skill by name or use /jpeng-web-scraper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of jpeng-web-scraper. - Supports web scraping for static and JavaScript-rendered pages. - Flexible data extraction using CSS selectors, XPath, or regex. - Includes AI-powered extraction for structured information. - Allows crawling multiple pages with rate limiting and robots.txt respect. - Provides simple command-line usage examples for various scenarios.

Metadata

Slug jpeng-web-scraper

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Web Scraper?

Web scraping skill with JavaScript rendering support. Extract data from websites using CSS selectors, XPath, or AI-powered extraction. It is an AI Agent Skill for Claude Code / OpenClaw, with 200 downloads so far.

How do I install Web Scraper?

Run "/install jpeng-web-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Web Scraper free?

Yes, Web Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Web Scraper support?

Web Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Web Scraper?

It is built and maintained by jpengcheng523-netizen (@jpengcheng523-netizen); the current version is v1.0.0.

More Skills

Web Scraper

Web Scraper

When to Use

Features

Usage

Simple scrape

With JavaScript rendering

Extract multiple fields

Crawl multiple pages

AI-powered extraction

Output

Rate Limiting

What is Web Scraper?

How do I install Web Scraper?

Is Web Scraper free?

Which platforms does Web Scraper support?

Who created Web Scraper?

💬 Comments