← Back to Skills Marketplace
hostilespider

Lightpanda Scraper

by HostileSpider · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
170
Downloads
1
Stars
2
Active Installs
1
Versions
Install in OpenClaw
/install lightpanda-scraper
Description
Fast headless browser web scraping using Lightpanda (0.5s page loads, 90x faster than Chromium). Perfect for OSINT recon, link extraction, and content scrapi...
README (SKILL.md)

Lightpanda Scraper — Fast Headless Browser for OSINT

Blazing fast web scraping using Lightpanda, a Zig-based headless browser. 0.5s per page vs 45s for Chromium/Playwright. Perfect for OSINT recon, link extraction, and content scraping.

Prerequisites

Install Lightpanda binary:

mkdir -p ~/.local/bin
curl -L https://github.com/nicholasgasior/lightpanda-browser/releases/latest/download/lightpanda-linux-x86_64 -o ~/.local/bin/lightpanda
chmod +x ~/.local/bin/lightpanda

Quick Start

# Dump page as markdown
python3 {baseDir}/scripts/lp-scrape.py https://target.com

# Extract all links
python3 {baseDir}/scripts/lp-scrape.py https://target.com --links

# Get raw HTML
python3 {baseDir}/scripts/lp-scrape.py https://target.com --html

Options

  • --links — Extract and categorize all links from the page
  • --html — Dump raw HTML instead of markdown
  • --frames — Include iframe content
  • --js "code" — Evaluate JavaScript on the page
  • --output FILE — Save output to file
  • --wait MODE — Wait condition: networkidle (default), load, domcontentloaded
  • --strip TYPES — Comma-separated resource types to strip: js, css, images
  • --proxy URL — Use proxy (e.g., socks5://127.0.0.1:9050 for Tor)
  • --timeout SECS — Request timeout (default: 30)
  • --serve --port PORT — Start CDP server mode
  • --mcp — Start as MCP server (stdio)

Use Cases

OSINT Recon

# Quick page dump for analysis
python3 {baseDir}/scripts/lp-scrape.py https://target.com > recon.md

# Extract all endpoints from a site
python3 {baseDir}/scripts/lp-scrape.py https://target.com --links | grep -i api

# Crawl with Tor
python3 {baseDir}/scripts/lp-scrape.py https://target.com --proxy socks5://127.0.0.1:9050

Bug Bounty Recon

# Fast subdomain content grab
for sub in api admin dev staging; do
  python3 {baseDir}/scripts/lp-scrape.py https://$sub.target.com --links 2>/dev/null
done

Content Extraction

# Save clean markdown
python3 {baseDir}/scripts/lp-scrape.py https://article.com --output article.md

# JavaScript evaluation
python3 {baseDir}/scripts/lp-scrape.py https://app.com --js "document.querySelectorAll('a').length"

CDP Server Mode

# Start server for programmatic access
python3 {baseDir}/scripts/lp-scrape.py --serve --port 9222
# Then connect with any CDP client

Speed Comparison

Tool Page Load Memory Binary Size
Lightpanda ~0.5s ~50MB ~100MB
Chromium/Playwright ~45s ~500MB ~300MB
curl/wget ~0.3s ~5MB N/A

Lightpanda gives you Playwright-like page rendering at near-curl speeds. The catch: no complex JS interactions (use Playwright for those).

Notes

  • Lightpanda is in active development; some complex SPAs may not render perfectly
  • For authenticated sessions or complex JS interactions, use Playwright instead
  • Binary is ~100MB Zig-compiled native code, runs on Linux x86_64
  • Supports HTTP/SOCKS5 proxies for Tor or VPN routing
Usage Guidance
This skill is internally consistent and appears to implement the advertised scraping functionality. The main caution: the SKILL.md instructs you to curl and execute a prebuilt native binary from GitHub Releases without providing a checksum or signature — that can run arbitrary native code. Before installing, consider: (1) verify the release and author on the repository, (2) prefer building from source if you can, (3) run the binary in a sandbox or non-sensitive environment, (4) inspect and test in a VM or container, and (5) be cautious when connecting it to sensitive networks or data. Also note the Python script may request the websocket-client package at runtime for JS evaluation; run it in a controlled environment and avoid giving it credentials or secrets.
Capability Analysis
Type: OpenClaw Skill Name: lightpanda-scraper Version: 1.0.0 The skill's installation process in SKILL.md involves downloading and executing a pre-compiled binary from a third-party GitHub repository (nicholasgasior/lightpanda-browser), which introduces a significant supply chain risk. While the Python wrapper (scripts/lp-scrape.py) appears to be a legitimate interface for the Lightpanda browser, it includes high-risk capabilities such as arbitrary JavaScript evaluation and the ability to start a CDP server, which could be leveraged for unintended purposes despite being aligned with the tool's stated functionality.
Capability Assessment
Purpose & Capability
Name/description match the code and instructions: a Python CLI that invokes a Lightpanda binary for scraping, link extraction, JS evaluation, and optional CDP/MCP server modes. Requiring python3 and a Lightpanda binary is proportionate and expected.
Instruction Scope
SKILL.md and the script keep scope to web scraping and server modes. The runtime instructions and examples only reference network targets, proxies, and starting local CDP/MCP servers. The script does start a local server (127.0.0.1) and connects via WebSocket for JS evaluation, and it may suggest installing the websocket-client Python package if missing — these behaviors are coherent with the described functionality.
Install Mechanism
The recommended install uses curl to download a prebuilt native binary from GitHub Releases and writes it to ~/.local/bin, then executes it. While GitHub releases is a common host, the binary is downloaded and executed without checksum or signature verification — this is a moderate risk (remote native code execution if the binary is malicious or the release is compromised). The install writes a binary to a user-local path (standard) rather than system-wide, which is appropriate, but lack of integrity checks is the main concern.
Credentials
No environment variables, credentials, or unrelated config paths are requested. The script accepts a proxy argument and can be used with Tor, which matches the stated use cases. No excessive or unexplained secrets access is present.
Persistence & Privilege
The skill is not always-enabled, does not request elevated privileges, and does not modify other skills or system-wide agent settings. It can run a local server on configurable ports, which is expected for CDP mode.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install lightpanda-scraper
  3. After installation, invoke the skill by name or use /lightpanda-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release — fast headless browser for OSINT recon, 0.5s page loads
Metadata
Slug lightpanda-scraper
Version 1.0.0
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 1
Frequently Asked Questions

What is Lightpanda Scraper?

Fast headless browser web scraping using Lightpanda (0.5s page loads, 90x faster than Chromium). Perfect for OSINT recon, link extraction, and content scrapi... It is an AI Agent Skill for Claude Code / OpenClaw, with 170 downloads so far.

How do I install Lightpanda Scraper?

Run "/install lightpanda-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Lightpanda Scraper free?

Yes, Lightpanda Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Lightpanda Scraper support?

Lightpanda Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Lightpanda Scraper?

It is built and maintained by HostileSpider (@hostilespider); the current version is v1.0.0.

💬 Comments