← 返回 Skills 市场
neekey

browser scraper

作者 neekey · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
116
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install browser-scraper
功能描述
Scrape websites using a real Chrome browser with the user's Chrome profile — shares cookies, auth, and fingerprint to bypass bot detection (Cloudflare, Reddi...
使用说明 (SKILL.md)

Browser Scraper

Scrapes web pages using Playwright with a real Chrome/Chromium binary and an existing user profile. Bypasses bot detection by sharing existing cookies, fingerprint, and session.

Profiles

The scraper supports multiple Chrome profiles:

  • Default (no --profile flag): Uses the system's default Chrome profile

    • macOS: ~/Library/Application Support/Google/Chrome/Default
    • Linux: ~/.config/google-chrome/Default
    • Windows: %LOCALAPPDATA%\Google\Chrome\User Data\Default
  • Named profile (--profile \x3Cname>): Uses profiles/\x3Cname>/ under the skill directory

    • Create a profile by launching Chrome with --profile-directory=Profile 1 or similar, then point the scraper at that folder
    • Useful for: isolating logins, avoiding conflicts with your main Chrome session, scraping without auth

Script

# Default profile (system Chrome)
node scripts/scrape.mjs \x3Curl> [css_selector]

# Named profile (profiles/\x3Cname>/)
node scripts/scrape.mjs \x3Curl> [css_selector] --profile \x3Cname>

# Headless mode (faster, higher block risk)
node scripts/scrape.mjs \x3Curl> --headless --profile \x3Cname>

# Keep browser open after scraping (for interactive use)
node scripts/scrape.mjs \x3Curl> --profile \x3Cname> --keep-open

# Extra wait for lazy-loaded content (default: 3000ms)
node scripts/scrape.mjs \x3Curl> --profile \x3Cname> --wait 6000

Run from the skill directory:

cd ~/.openclaw-yekeen/workspace/skills/browser-scraper/
node scripts/scrape.mjs https://www.reddit.com/

Output

  • JSON to stdout: matched elements or page preview
  • Screenshot saved to /tmp/browser-scraper-last.png

Key Design

  • channel: 'chrome' — launches real Chrome when available, falls back to system Chromium
  • launchPersistentContext with the profile directory
  • --disable-blink-features=AutomationControlled + navigator.webdriver patch
  • headless: false by default to avoid SingletonLock conflicts

Requirements

  • Playwright installed: npm install playwright
  • Chrome or Chromium installed on the system
  • On macOS/Linux: the channel: 'chrome' option requires Chrome (not Chromium) to be installed

Tips

  • Chrome must not already be open with the target profile (SingletonLock error). Close Chrome first, or use a named profile to avoid conflicts.
  • If you get a SingletonLock error with a named profile, delete the SingletonLock file in that profile directory and try again.
  • Use --keep-open to leave the browser open for interactive use after scraping — Ctrl+C to close.
  • For sites with lazy-loaded content: use --wait \x3Cms> flag or modify the script to increase waitForTimeout
  • For Reddit: use selector shreddit-post and read attributes (post-title, author, score, permalink)
  • To create a fresh isolated profile: run Chrome from the terminal with --profile-directory=Profile X and log in, then point the scraper at that directory
安全使用建议
This skill intentionally launches Chrome with your profile and will share cookies, auth tokens and browser fingerprint to evade bot detection — that means any site you visit via the skill can observe your logged-in session. The script also deletes 'SingletonLock' and session files inside the profile directory to avoid launch conflicts; that can remove session state or cause unexpected browser behavior. Before using: (1) review the code yourself or run it in an isolated account/container, (2) prefer using a named skill-local profile instead of your system default profile to avoid exposing your main browsing sessions, (3) back up your Chrome profile if you plan to run it against your default profile, (4) ensure you have Node >=18 and install Playwright per SKILL.md, and (5) do not run it under privileged accounts. The skill's registry entry did not declare these filesystem accesses or destructive actions — treat that omission as a red flag and proceed cautiously.
能力评估
Purpose & Capability
The code and SKILL.md align with the declared purpose: it launches Playwright with a real Chrome profile to share cookies/auth and patch navigator.webdriver. However the implementation deletes stale lock and session files in the target profile directory (unlinkSync calls) — this is functionally related to using a persistent profile but is a potentially destructive side-effect that users may not expect.
Instruction Scope
The runtime instructions and script access the user's system Chrome profile directories, clean up (delete) SingletonLock/Session files, and may read cookies/auth state implicitly by launching a persistent profile. While reading the profile is part of bypassing bot detection, deleting session files and altering a user's profile is beyond passive scraping and carries data-loss/privacy risk. The SKILL.md does not adequately enumerate these destructive file operations.
Install Mechanism
There is no remote download/install step; the package lists Playwright as a dependency (package.json/lock present) and the SKILL.md instructs installing Playwright via npm. No external URLs or extract-from-URL installations were used.
Credentials
The skill metadata declares no required config paths or credentials, yet the script directly reads and modifies standard Chrome profile paths (system default and skill-local profiles). Access to those profile directories can expose sensitive cookies, session tokens, and other private data. The fact these filesystem accesses are not declared in the registry metadata is an incoherence and raises privacy risk.
Persistence & Privilege
The skill is not always-enabled and does not request special agent privileges. Still, it modifies user files in the browser profile (deleting lock/session files). That is a non-trivial privilege to exercise on a user's machine and should be considered before running.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install browser-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /browser-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of browser-scraper. - Enables scraping of websites using a real Chrome browser and user Chrome profile to bypass bot detection and access authenticated content. - Supports both default system Chrome profiles and custom named profiles for isolated sessions. - Offers optional features: headless mode, adjustable wait times for dynamic content, and interactive mode keeping the browser open. - Outputs extracted data as JSON and saves page screenshots. - Requires Playwright and a local Chrome/Chromium installation. - Includes troubleshooting and usage tips for avoiding profile/lock conflicts and improving scrape results.
元数据
Slug browser-scraper
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

browser scraper 是什么?

Scrape websites using a real Chrome browser with the user's Chrome profile — shares cookies, auth, and fingerprint to bypass bot detection (Cloudflare, Reddi... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 116 次。

如何安装 browser scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install browser-scraper」即可一键安装,无需额外配置。

browser scraper 是免费的吗?

是的,browser scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

browser scraper 支持哪些平台?

browser scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 browser scraper?

由 neekey(@neekey)开发并维护,当前版本 v1.0.0。

💬 留言讨论