← Back to Skills Marketplace
neekey

browser scraper

by neekey · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
116
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install browser-scraper
Description
Scrape websites using a real Chrome browser with the user's Chrome profile — shares cookies, auth, and fingerprint to bypass bot detection (Cloudflare, Reddi...
README (SKILL.md)

Browser Scraper

Scrapes web pages using Playwright with a real Chrome/Chromium binary and an existing user profile. Bypasses bot detection by sharing existing cookies, fingerprint, and session.

Profiles

The scraper supports multiple Chrome profiles:

  • Default (no --profile flag): Uses the system's default Chrome profile

    • macOS: ~/Library/Application Support/Google/Chrome/Default
    • Linux: ~/.config/google-chrome/Default
    • Windows: %LOCALAPPDATA%\Google\Chrome\User Data\Default
  • Named profile (--profile \x3Cname>): Uses profiles/\x3Cname>/ under the skill directory

    • Create a profile by launching Chrome with --profile-directory=Profile 1 or similar, then point the scraper at that folder
    • Useful for: isolating logins, avoiding conflicts with your main Chrome session, scraping without auth

Script

# Default profile (system Chrome)
node scripts/scrape.mjs \x3Curl> [css_selector]

# Named profile (profiles/\x3Cname>/)
node scripts/scrape.mjs \x3Curl> [css_selector] --profile \x3Cname>

# Headless mode (faster, higher block risk)
node scripts/scrape.mjs \x3Curl> --headless --profile \x3Cname>

# Keep browser open after scraping (for interactive use)
node scripts/scrape.mjs \x3Curl> --profile \x3Cname> --keep-open

# Extra wait for lazy-loaded content (default: 3000ms)
node scripts/scrape.mjs \x3Curl> --profile \x3Cname> --wait 6000

Run from the skill directory:

cd ~/.openclaw-yekeen/workspace/skills/browser-scraper/
node scripts/scrape.mjs https://www.reddit.com/

Output

  • JSON to stdout: matched elements or page preview
  • Screenshot saved to /tmp/browser-scraper-last.png

Key Design

  • channel: 'chrome' — launches real Chrome when available, falls back to system Chromium
  • launchPersistentContext with the profile directory
  • --disable-blink-features=AutomationControlled + navigator.webdriver patch
  • headless: false by default to avoid SingletonLock conflicts

Requirements

  • Playwright installed: npm install playwright
  • Chrome or Chromium installed on the system
  • On macOS/Linux: the channel: 'chrome' option requires Chrome (not Chromium) to be installed

Tips

  • Chrome must not already be open with the target profile (SingletonLock error). Close Chrome first, or use a named profile to avoid conflicts.
  • If you get a SingletonLock error with a named profile, delete the SingletonLock file in that profile directory and try again.
  • Use --keep-open to leave the browser open for interactive use after scraping — Ctrl+C to close.
  • For sites with lazy-loaded content: use --wait \x3Cms> flag or modify the script to increase waitForTimeout
  • For Reddit: use selector shreddit-post and read attributes (post-title, author, score, permalink)
  • To create a fresh isolated profile: run Chrome from the terminal with --profile-directory=Profile X and log in, then point the scraper at that directory
Usage Guidance
This skill intentionally launches Chrome with your profile and will share cookies, auth tokens and browser fingerprint to evade bot detection — that means any site you visit via the skill can observe your logged-in session. The script also deletes 'SingletonLock' and session files inside the profile directory to avoid launch conflicts; that can remove session state or cause unexpected browser behavior. Before using: (1) review the code yourself or run it in an isolated account/container, (2) prefer using a named skill-local profile instead of your system default profile to avoid exposing your main browsing sessions, (3) back up your Chrome profile if you plan to run it against your default profile, (4) ensure you have Node >=18 and install Playwright per SKILL.md, and (5) do not run it under privileged accounts. The skill's registry entry did not declare these filesystem accesses or destructive actions — treat that omission as a red flag and proceed cautiously.
Capability Assessment
Purpose & Capability
The code and SKILL.md align with the declared purpose: it launches Playwright with a real Chrome profile to share cookies/auth and patch navigator.webdriver. However the implementation deletes stale lock and session files in the target profile directory (unlinkSync calls) — this is functionally related to using a persistent profile but is a potentially destructive side-effect that users may not expect.
Instruction Scope
The runtime instructions and script access the user's system Chrome profile directories, clean up (delete) SingletonLock/Session files, and may read cookies/auth state implicitly by launching a persistent profile. While reading the profile is part of bypassing bot detection, deleting session files and altering a user's profile is beyond passive scraping and carries data-loss/privacy risk. The SKILL.md does not adequately enumerate these destructive file operations.
Install Mechanism
There is no remote download/install step; the package lists Playwright as a dependency (package.json/lock present) and the SKILL.md instructs installing Playwright via npm. No external URLs or extract-from-URL installations were used.
Credentials
The skill metadata declares no required config paths or credentials, yet the script directly reads and modifies standard Chrome profile paths (system default and skill-local profiles). Access to those profile directories can expose sensitive cookies, session tokens, and other private data. The fact these filesystem accesses are not declared in the registry metadata is an incoherence and raises privacy risk.
Persistence & Privilege
The skill is not always-enabled and does not request special agent privileges. Still, it modifies user files in the browser profile (deleting lock/session files). That is a non-trivial privilege to exercise on a user's machine and should be considered before running.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install browser-scraper
  3. After installation, invoke the skill by name or use /browser-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of browser-scraper. - Enables scraping of websites using a real Chrome browser and user Chrome profile to bypass bot detection and access authenticated content. - Supports both default system Chrome profiles and custom named profiles for isolated sessions. - Offers optional features: headless mode, adjustable wait times for dynamic content, and interactive mode keeping the browser open. - Outputs extracted data as JSON and saves page screenshots. - Requires Playwright and a local Chrome/Chromium installation. - Includes troubleshooting and usage tips for avoiding profile/lock conflicts and improving scrape results.
Metadata
Slug browser-scraper
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is browser scraper?

Scrape websites using a real Chrome browser with the user's Chrome profile — shares cookies, auth, and fingerprint to bypass bot detection (Cloudflare, Reddi... It is an AI Agent Skill for Claude Code / OpenClaw, with 116 downloads so far.

How do I install browser scraper?

Run "/install browser-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is browser scraper free?

Yes, browser scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does browser scraper support?

browser scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created browser scraper?

It is built and maintained by neekey (@neekey); the current version is v1.0.0.

💬 Comments