← Back to Skills Marketplace
guifav

Web Scraper

by Guilherme Favaron · GitHub ↗ · v0.1.1 · MIT-0
cross-platform ⚠ suspicious
8181
Downloads
3
Stars
125
Active Installs
2
Versions
Install in OpenClaw
/install web-scraper
Description
Web scraping and content comprehension agent — multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and...
Usage Guidance
Install only if you will use it on sites and content you are authorized to scrape. Review generated scripts before running them, avoid the soft-paywall reveal workflow, and do not use optional OpenRouter entity extraction on confidential or access-controlled content unless that external data flow is acceptable.
Capability Analysis
Type: OpenClaw Skill Name: web-scraper Version: 0.1.1 The 'web-scraper' skill is a well-structured and documented tool for multi-stage web content extraction and analysis. It follows industry best practices such as rate limiting, robots.txt compliance, and a cascade approach to resource usage, while explicitly instructing the agent to avoid sensitive files like .env and to handle API keys securely via environment variables (SKILL.md, claw.json).
Capability Assessment
Purpose & Capability
Network scraping, local output files, Playwright rendering, and optional LLM entity extraction fit the stated web-scraper purpose, but the soft-paywall workflow goes beyond normal public-content extraction by directing DOM manipulation to expose hidden subscriber-gated material.
Instruction Scope
The skill is generally scoped with planning, rate limiting, robots.txt guidance, and credential-file avoidance, but its paywall instructions are under-scoped because they permit revealing hidden paywalled text without a clear authorization requirement.
Install Mechanism
The package contains markdown instructions, a changelog, and claw.json metadata only; no executable installer, hidden script, or automatic startup mechanism was found.
Credentials
Filesystem and network permissions, Python/pip/npx requirements, and generated scraping scripts are proportionate for this purpose; optional Stage 5 sends cleaned article text to OpenRouter when used.
Persistence & Privilege
The skill creates scripts, YAML configs, JSON outputs, and checkpoints, and it says not to read or modify .env or credential files; no background persistence or privilege escalation is evident.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install web-scraper
  3. After installation, invoke the skill by name or use /web-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.1
web-scraper 0.1.1 - Added a CHANGELOG.md file for better tracking of updates. - Clarified credential handling: the skill itself never makes direct API calls, and only generated scripts reference `OPENROUTER_API_KEY` if LLM extraction is used. - Updated environment survey instructions to check for presence of `OPENROUTER_API_KEY` (for generated code), not the value. - Improved documentation for credential scope and clarified planning protocol steps.
v0.1.0
web-scraper 0.1.0 — Initial release - Introduces a web scraping and content comprehension agent with multi-strategy extraction and cascade fallback. - Implements a mandatory planning protocol for all scraping requests, emphasizing environment and target analysis before action. - Features a 5-stage pipeline: News/Article detection, multi-strategy extraction, cleaning/normalization, structured metadata extraction, and optional entity extraction using LLMs. - Ensures safe credential handling by referencing `OPENROUTER_API_KEY` only in template code. - Outputs structured JSON files with detailed content and quality metadata.
Metadata
Slug web-scraper
Version 0.1.1
License MIT-0
All-time Installs 287
Active Installs 125
Total Versions 2
Frequently Asked Questions

What is Web Scraper?

Web scraping and content comprehension agent — multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and... It is an AI Agent Skill for Claude Code / OpenClaw, with 8181 downloads so far.

How do I install Web Scraper?

Run "/install web-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Web Scraper free?

Yes, Web Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Web Scraper support?

Web Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Web Scraper?

It is built and maintained by Guilherme Favaron (@guifav); the current version is v0.1.1.

💬 Comments