← Back to Skills Marketplace
lilw-yezi

Webpage Export

by Yeziwnl · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
237
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install webpage-export
Description
Export webpages into clean local TXT, DOCX, and PDF files with source metadata, fallback extraction logic, and browser-assisted recovery for difficult pages....
Usage Guidance
This skill appears to do what it claims, but take these precautions before installing/using it: 1) Ensure you have python3, curl, node, the Node 'playwright' package, and Chrome/Chromium (and textutil on macOS) installed — the registry metadata does not list these, so failure modes are likely if they are missing. 2) Run the tool in a controlled workspace (explicit --outdir) and avoid running it against untrusted internal URLs: the headless browser will execute page JavaScript which can trigger network calls or other side effects originating from the target page. 3) Because the skill owner is unknown, review and test on safe pages first. 4) If you need an automated install of dependencies, add or request an install spec from the publisher before wider deployment.
Capability Analysis
Type: OpenClaw Skill Name: webpage-export Version: 1.0.1 The skill bundle provides a tool for exporting webpages using curl, headless Chrome, and Node.js/Playwright. While the logic is aligned with the stated purpose, the script `scripts/export_webpage.py` lacks URL scheme validation, allowing for potential Local File Inclusion (LFI) via 'file://' URLs. It also permits writing files to arbitrary locations through the '--outdir' parameter and executes an embedded Node.js script, which increases the attack surface for an AI agent if it is manipulated into accessing sensitive local resources.
Capability Assessment
Purpose & Capability
Name/description (export webpages to TXT/DOCX/PDF) align with the included script and reference docs. The script fetches arbitrary URLs, extracts text/metadata, optionally uses Chrome/Chromium and Playwright for rendering, and emits JSON metadata — all expected for this purpose. Minor mismatch: registry metadata lists no required binaries, but SKILL.md and the script clearly require python3, curl, node+playwright, and Chrome/Chromium (and textutil on macOS).
Instruction Scope
SKILL.md instructions are narrowly scoped to running scripts/export_webpage.py with flags and reading the included references. The runtime behavior (curl fetch, HTML parsing, optional headless browser execution, local file writes) is explicitly documented. The SKILL.md warns the browser fallback will execute page JavaScript. The instructions do not direct the agent to read unrelated system files or transmit data to unexpected external endpoints.
Install Mechanism
This is an instruction-only skill with a bundled script and no install spec. That is low-risk, but practical friction exists: the skill expects runtime dependencies (python3, curl, node, the Node 'playwright' package, and Chrome/Chromium) without providing automated installation. There's no evidence of downloads from untrusted hosts or hidden installers in the bundle.
Credentials
The skill declares no required credentials or special env vars; the script only reads PATH and HOME from the environment and sets CHROME_BIN for child processes (pointing to a local Chrome path it finds). No secrets or unrelated credentials are requested.
Persistence & Privilege
The skill is not marked always:true and does not request system- or agent-wide configuration changes. It runs on-demand and writes outputs under local output folders; no elevated persistence or cross-skill config writes are present in the provided files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install webpage-export
  3. After installation, invoke the skill by name or use /webpage-export
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Updates default output folder behavior: if no --outdir is specified, outputs now default to a local export folder under the current working directory. - Adds runtime requirements section clarifying dependencies such as python3, curl, Chrome/Chromium, node, and playwright for various export functions. - Adds safety and execution notes, particularly regarding headless browser usage and best practices for production environments. - Example commands and documentation reflect new output and requirements, replacing hardcoded paths with generic, workspace-relative locations. - No functional code changes—documentation update for improved clarity and user guidance.
v1.0.0
Initial release of webpage-export. Export webpages into clean TXT, DOCX, and PDF files. Preserve source metadata including title, URL, source, and publish-time when available. Add fallback extraction logic and browser-assisted recovery for difficult pages. Support webpage archiving for articles, policy pages, WeChat posts, and official notices.
Metadata
Slug webpage-export
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Webpage Export?

Export webpages into clean local TXT, DOCX, and PDF files with source metadata, fallback extraction logic, and browser-assisted recovery for difficult pages.... It is an AI Agent Skill for Claude Code / OpenClaw, with 237 downloads so far.

How do I install Webpage Export?

Run "/install webpage-export" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Webpage Export free?

Yes, Webpage Export is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Webpage Export support?

Webpage Export is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Webpage Export?

It is built and maintained by Yeziwnl (@lilw-yezi); the current version is v1.0.1.

💬 Comments