← Back to Skills Marketplace

Web Scraper

Name: Web Scraper
Author: guifav

by Guilherme Favaron · GitHub ↗ · v0.1.1 · MIT-0

cross-platform ⚠ suspicious

8181

Downloads

Stars

125

Active Installs

Versions

Install in OpenClaw

/install web-scraper

Description

Web scraping and content comprehension agent — multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and...

Usage Guidance

Install only if you will use it on sites and content you are authorized to scrape. Review generated scripts before running them, avoid the soft-paywall reveal workflow, and do not use optional OpenRouter entity extraction on confidential or access-controlled content unless that external data flow is acceptable.

Capability Analysis

Type: OpenClaw Skill Name: web-scraper Version: 0.1.1 The 'web-scraper' skill is a well-structured and documented tool for multi-stage web content extraction and analysis. It follows industry best practices such as rate limiting, robots.txt compliance, and a cascade approach to resource usage, while explicitly instructing the agent to avoid sensitive files like .env and to handle API keys securely via environment variables (SKILL.md, claw.json).

Capability Assessment

⚠ Purpose & Capability

Network scraping, local output files, Playwright rendering, and optional LLM entity extraction fit the stated web-scraper purpose, but the soft-paywall workflow goes beyond normal public-content extraction by directing DOM manipulation to expose hidden subscriber-gated material.

⚠ Instruction Scope

The skill is generally scoped with planning, rate limiting, robots.txt guidance, and credential-file avoidance, but its paywall instructions are under-scoped because they permit revealing hidden paywalled text without a clear authorization requirement.

✓ Install Mechanism

The package contains markdown instructions, a changelog, and claw.json metadata only; no executable installer, hidden script, or automatic startup mechanism was found.

ℹ Credentials

Filesystem and network permissions, Python/pip/npx requirements, and generated scraping scripts are proportionate for this purpose; optional Stage 5 sends cleaned article text to OpenRouter when used.

ℹ Persistence & Privilege

The skill creates scripts, YAML configs, JSON outputs, and checkpoints, and it says not to read or modify .env or credential files; no background persistence or privilege escalation is evident.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install web-scraper
After installation, invoke the skill by name or use /web-scraper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.1.1

web-scraper 0.1.1 - Added a CHANGELOG.md file for better tracking of updates. - Clarified credential handling: the skill itself never makes direct API calls, and only generated scripts reference `OPENROUTER_API_KEY` if LLM extraction is used. - Updated environment survey instructions to check for presence of `OPENROUTER_API_KEY` (for generated code), not the value. - Improved documentation for credential scope and clarified planning protocol steps.

v0.1.0

web-scraper 0.1.0 — Initial release - Introduces a web scraping and content comprehension agent with multi-strategy extraction and cascade fallback. - Implements a mandatory planning protocol for all scraping requests, emphasizing environment and target analysis before action. - Features a 5-stage pipeline: News/Article detection, multi-strategy extraction, cleaning/normalization, structured metadata extraction, and optional entity extraction using LLMs. - Ensures safe credential handling by referencing `OPENROUTER_API_KEY` only in template code. - Outputs structured JSON files with detailed content and quality metadata.

Metadata

Slug web-scraper

Version 0.1.1

License MIT-0

All-time Installs 287

Active Installs 125

Total Versions 2

Frequently Asked Questions

What is Web Scraper?

Web scraping and content comprehension agent — multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and... It is an AI Agent Skill for Claude Code / OpenClaw, with 8181 downloads so far.

How do I install Web Scraper?

Run "/install web-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Web Scraper free?

Yes, Web Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Web Scraper support?

Web Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Web Scraper?

It is built and maintained by Guilherme Favaron (@guifav); the current version is v0.1.1.

More Skills