← Back to Skills Marketplace

Scrapling Web Extractor

Name: Scrapling Web Extractor
Author: yumiu8103-hue

by yumiu8103-hue · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

462

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install web-markdown-scraper

Description

Fetch one or more public webpages with Scrapling, extract the main content, and convert HTML into Markdown using html2text. Supports static HTTP, concurrent...

Usage Guidance

This skill appears internally consistent, but check the following before installing: 1) The script dynamically imports and relies on the external 'scrapling' package and Playwright — audit or trust those packages before installing them. 2) Using stealth mode and proxies is legitimately used to reach anti-bot protected pages, but you must not use the tool to bypass login walls, CAPTCHAs, paywalls, or access-restricted content (the SKILL.md states this). 3) Playwright installation downloads a Chromium binary; ensure you accept that download. 4) Proxy credentials passed at runtime will be used to route requests — keep them secure and avoid supplying credentials you don't trust. 5) The tool writes Markdown files and an automatch DB to the output directory; review and manage those local files as needed.

Capability Analysis

Type: OpenClaw Skill Name: web-markdown-scraper Version: 1.0.0 The skill is a legitimate web-to-markdown scraper that utilizes the 'scrapling' and 'html2text' libraries. The core script (scripts/scrape_to_markdown.py) is well-structured, includes URL validation, and sanitizes filenames to prevent path traversal. While there is a significant discrepancy between the advanced features described in the documentation (SKILL.md and README.md)—such as stealth mode, proxy support, and anti-bot bypass—and the actual implementation in the Python script, this appears to be a functional bug or incomplete implementation rather than a security threat. There is no evidence of data exfiltration, malicious execution, or prompt injection attacks against the agent.

Capability Assessment

✓ Purpose & Capability

Name, description, README, SKILL.md and the included Python script all align: they implement fetching public web pages (static or JS), extracting main content and converting HTML to Markdown. Features like stealth, proxies, Playwright, and automatch are legitimate for robust scraping and are consistent with the stated purpose.

ℹ Instruction Scope

SKILL.md and the script limit network calls to user-supplied URLs and an optional proxy. The skill provides flags to enable stealth, proxying, and Playwright rendering; these are powerful but described and constrained (rules state not to bypass logins/paywalls). The code dynamically imports the 'scrapling' package at runtime, so actual fetching behavior depends on that external dependency.

✓ Install Mechanism

No install spec is included (instruction-only); the README suggests installing third-party Python packages (scrapling, html2text, Playwright). That is a normal, low-risk pattern for an instruction-only Python skill, but it does mean the fetched packages and Playwright binaries will be installed separately by the user.

✓ Credentials

The skill declares no required environment variables or credentials. Proxy credentials can be supplied as runtime flags (appropriate for a scraper). The script's security manifest claims it reads only user-provided URL/file inputs and writes only to the chosen output directory and the Scrapling-managed local DB—no unexpected secrets are requested.

ℹ Persistence & Privilege

always is false and the skill is user-invocable. It writes local output files and (per its manifest) a Scrapling automatch SQLite DB; this is reasonable for its functionality but does create persistent local artifacts that a user should be aware of.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install web-markdown-scraper
After installation, invoke the skill by name or use /web-markdown-scraper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release. - 4 fetcher modes: http, async, stealth (Camoufox), dynamic (Playwright) - CSS selector-based content extraction with auto_save / auto_match - Proxy support with humanize, geoip, block-webrtc options - --disable-resources and --block-images for faster scraping - --retry N with exponential backoff - Structured JSON output with per-page title, markdown, and status

Metadata

Slug web-markdown-scraper

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Scrapling Web Extractor?

Fetch one or more public webpages with Scrapling, extract the main content, and convert HTML into Markdown using html2text. Supports static HTTP, concurrent... It is an AI Agent Skill for Claude Code / OpenClaw, with 462 downloads so far.

How do I install Scrapling Web Extractor?

Run "/install web-markdown-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Scrapling Web Extractor free?

Yes, Scrapling Web Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Scrapling Web Extractor support?

Scrapling Web Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Scrapling Web Extractor?

It is built and maintained by yumiu8103-hue (@yumiu8103-hue); the current version is v1.0.0.

More Skills