← 返回 Skills 市场

Lightpanda Scraper

Name: Lightpanda Scraper
Author: hostilespider

作者 HostileSpider · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

170

总下载

当前安装

版本数

在 OpenClaw 中安装

/install lightpanda-scraper

功能描述

Fast headless browser web scraping using Lightpanda (0.5s page loads, 90x faster than Chromium). Perfect for OSINT recon, link extraction, and content scrapi...

使用说明 (SKILL.md)

Lightpanda Scraper — Fast Headless Browser for OSINT

Blazing fast web scraping using Lightpanda, a Zig-based headless browser. 0.5s per page vs 45s for Chromium/Playwright. Perfect for OSINT recon, link extraction, and content scraping.

Prerequisites

Install Lightpanda binary:

mkdir -p ~/.local/bin
curl -L https://github.com/nicholasgasior/lightpanda-browser/releases/latest/download/lightpanda-linux-x86_64 -o ~/.local/bin/lightpanda
chmod +x ~/.local/bin/lightpanda

Quick Start

# Dump page as markdown
python3 {baseDir}/scripts/lp-scrape.py https://target.com

# Extract all links
python3 {baseDir}/scripts/lp-scrape.py https://target.com --links

# Get raw HTML
python3 {baseDir}/scripts/lp-scrape.py https://target.com --html

Options

--links — Extract and categorize all links from the page
--html — Dump raw HTML instead of markdown
--frames — Include iframe content
--js "code" — Evaluate JavaScript on the page
--output FILE — Save output to file
--wait MODE — Wait condition: networkidle (default), load, domcontentloaded
--strip TYPES — Comma-separated resource types to strip: js, css, images
--proxy URL — Use proxy (e.g., socks5://127.0.0.1:9050 for Tor)
--timeout SECS — Request timeout (default: 30)
--serve --port PORT — Start CDP server mode
--mcp — Start as MCP server (stdio)

Use Cases

OSINT Recon

# Quick page dump for analysis
python3 {baseDir}/scripts/lp-scrape.py https://target.com > recon.md

# Extract all endpoints from a site
python3 {baseDir}/scripts/lp-scrape.py https://target.com --links | grep -i api

# Crawl with Tor
python3 {baseDir}/scripts/lp-scrape.py https://target.com --proxy socks5://127.0.0.1:9050

Bug Bounty Recon

# Fast subdomain content grab
for sub in api admin dev staging; do
  python3 {baseDir}/scripts/lp-scrape.py https://$sub.target.com --links 2>/dev/null
done

Content Extraction

# Save clean markdown
python3 {baseDir}/scripts/lp-scrape.py https://article.com --output article.md

# JavaScript evaluation
python3 {baseDir}/scripts/lp-scrape.py https://app.com --js "document.querySelectorAll('a').length"

CDP Server Mode

# Start server for programmatic access
python3 {baseDir}/scripts/lp-scrape.py --serve --port 9222
# Then connect with any CDP client

Speed Comparison

Tool	Page Load	Memory	Binary Size
Lightpanda	~0.5s	~50MB	~100MB
Chromium/Playwright	~45s	~500MB	~300MB
curl/wget	~0.3s	~5MB	N/A

Lightpanda gives you Playwright-like page rendering at near-curl speeds. The catch: no complex JS interactions (use Playwright for those).

Notes

Lightpanda is in active development; some complex SPAs may not render perfectly
For authenticated sessions or complex JS interactions, use Playwright instead
Binary is ~100MB Zig-compiled native code, runs on Linux x86_64
Supports HTTP/SOCKS5 proxies for Tor or VPN routing

安全使用建议

This skill is internally consistent and appears to implement the advertised scraping functionality. The main caution: the SKILL.md instructs you to curl and execute a prebuilt native binary from GitHub Releases without providing a checksum or signature — that can run arbitrary native code. Before installing, consider: (1) verify the release and author on the repository, (2) prefer building from source if you can, (3) run the binary in a sandbox or non-sensitive environment, (4) inspect and test in a VM or container, and (5) be cautious when connecting it to sensitive networks or data. Also note the Python script may request the websocket-client package at runtime for JS evaluation; run it in a controlled environment and avoid giving it credentials or secrets.

功能分析

Type: OpenClaw Skill Name: lightpanda-scraper Version: 1.0.0 The skill's installation process in SKILL.md involves downloading and executing a pre-compiled binary from a third-party GitHub repository (nicholasgasior/lightpanda-browser), which introduces a significant supply chain risk. While the Python wrapper (scripts/lp-scrape.py) appears to be a legitimate interface for the Lightpanda browser, it includes high-risk capabilities such as arbitrary JavaScript evaluation and the ability to start a CDP server, which could be leveraged for unintended purposes despite being aligned with the tool's stated functionality.

能力评估

✓ Purpose & Capability

Name/description match the code and instructions: a Python CLI that invokes a Lightpanda binary for scraping, link extraction, JS evaluation, and optional CDP/MCP server modes. Requiring python3 and a Lightpanda binary is proportionate and expected.

✓ Instruction Scope

SKILL.md and the script keep scope to web scraping and server modes. The runtime instructions and examples only reference network targets, proxies, and starting local CDP/MCP servers. The script does start a local server (127.0.0.1) and connects via WebSocket for JS evaluation, and it may suggest installing the websocket-client Python package if missing — these behaviors are coherent with the described functionality.

⚠ Install Mechanism

The recommended install uses curl to download a prebuilt native binary from GitHub Releases and writes it to ~/.local/bin, then executes it. While GitHub releases is a common host, the binary is downloaded and executed without checksum or signature verification — this is a moderate risk (remote native code execution if the binary is malicious or the release is compromised). The install writes a binary to a user-local path (standard) rather than system-wide, which is appropriate, but lack of integrity checks is the main concern.

✓ Credentials

No environment variables, credentials, or unrelated config paths are requested. The script accepts a proxy argument and can be used with Tor, which matches the stated use cases. No excessive or unexplained secrets access is present.

✓ Persistence & Privilege

The skill is not always-enabled, does not request elevated privileges, and does not modify other skills or system-wide agent settings. It can run a local server on configurable ports, which is expected for CDP mode.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install lightpanda-scraper
安装完成后，直接呼叫该 Skill 的名称或使用 /lightpanda-scraper 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release — fast headless browser for OSINT recon, 0.5s page loads

元数据

Slug lightpanda-scraper

版本 1.0.0

许可证 MIT-0

累计安装 2

当前安装数 2

历史版本数 1

常见问题