Description

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from w...

README (SKILL.md)

Browser Automation

Name: Browse 2.0.2
Author: radical7vii

Automate browser interactions using the browse CLI with Claude.

Setup check

Before running any browser commands, verify the CLI is available:

which browse || npm install -g @browserbasehq/browse-cli

Environment Selection (Local vs Remote)

The CLI automatically selects between local and remote browser environments based on available configuration:

Local mode (default)

Uses local Chrome — no API keys needed
Best for: development, simple pages, trusted sites with no bot protection

Remote mode (Browserbase)

Activated when BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID are set
Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence
Use remote mode when: the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access
Get credentials at https://browserbase.com/settings

When to choose which

Simple browsing (docs, wikis, public APIs): local mode is fine
Protected sites (login walls, CAPTCHAs, anti-scraping): use remote mode
If local mode fails with bot detection or access denied: switch to remote mode

Commands

All commands work identically in both modes. The daemon auto-starts on first command.

Navigation

browse open \x3Curl>                        # Go to URL (aliases: goto)
browse reload                            # Reload current page
browse back                              # Go back in history
browse forward                           # Go forward in history

Page state (prefer snapshot over screenshot)

browse snapshot                          # Get accessibility tree with element refs (fast, structured)
browse screenshot [path]                 # Take visual screenshot (slow, uses vision tokens)
browse get url                           # Get current URL
browse get title                         # Get page title
browse get text \x3Cselector>               # Get text content (use "body" for all text)
browse get html \x3Cselector>               # Get HTML content of element
browse get value \x3Cselector>              # Get form field value

Use browse snapshot as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use browse screenshot when you need visual context (layout, images, debugging).

Interaction

browse click \x3Cref>                       # Click element by ref from snapshot (e.g., @0-5)
browse type \x3Ctext>                       # Type text into focused element
browse fill \x3Cselector> \x3Cvalue>           # Fill input and press Enter
browse select \x3Cselector> \x3Cvalues...>     # Select dropdown option(s)
browse press \x3Ckey>                       # Press key (Enter, Tab, Escape, Cmd+A, etc.)
browse drag \x3CfromX> \x3CfromY> \x3CtoX> \x3CtoY>  # Drag from one point to another
browse scroll \x3Cx> \x3Cy> \x3CdeltaX> \x3CdeltaY> # Scroll at coordinates
browse highlight \x3Cselector>              # Highlight element on page
browse is visible \x3Cselector>             # Check if element is visible
browse is checked \x3Cselector>             # Check if element is checked
browse wait \x3Ctype> [arg]                 # Wait for: load, selector, timeout

Session management

browse stop                              # Stop the browser daemon
browse status                            # Check daemon status (includes env)
browse env                               # Show current environment (local or remote)
browse env local                         # Switch to local Chrome
browse env remote                        # Switch to Browserbase (requires API keys)
browse pages                             # List all open tabs
browse tab_switch \x3Cindex>                # Switch to tab by index
browse tab_close [index]                 # Close tab

Typical workflow

browse open \x3Curl> — navigate to the page
browse snapshot — read the accessibility tree to understand page structure and get element refs
browse click \x3Cref> / browse type \x3Ctext> / browse fill \x3Cselector> \x3Cvalue> — interact using refs from snapshot
browse snapshot — confirm the action worked
Repeat 3-4 as needed
browse stop — close the browser when done

Quick Example

browse open https://example.com
browse snapshot                          # see page structure + element refs
browse click @0-5                        # click element with ref 0-5
browse get title
browse stop

Mode Comparison

Feature	Local	Browserbase
Speed	Faster	Slightly slower
Setup	Chrome required	API key required
Stealth mode	No	Yes (custom Chromium, anti-bot fingerprinting)
CAPTCHA solving	No	Yes (automatic reCAPTCHA/hCaptcha)
Residential proxies	No	Yes (201 countries, geo-targeting)
Session persistence	No	Yes (cookies/auth persist across sessions)
Best for	Development/simple pages	Protected sites, bot detection, production scraping

Best Practices

Always browse open first before interacting
Use browse snapshot to check page state — it's fast and gives you element refs
Only screenshot when visual context is needed (layout checks, images, debugging)
Use refs from snapshot to click/interact — e.g., browse click @0-5
browse stop when done to clean up the browser session

Troubleshooting

"No active page": Run browse stop, then check browse status. If it still says running, kill the zombie daemon with pkill -f "browse.*daemon", then retry browse open
Chrome not found: Install Chrome or use browse env remote
Action fails: Run browse snapshot to see available elements and their refs
Browserbase fails: Verify API key and project ID are set

Switching to Remote Mode

Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it.

Don't switch for simple sites (docs, wikis, public APIs, localhost).

browse env remote            # switch to Browserbase
browse env local             # switch back to local Chrome

The switch is sticky until you run browse stop or switch again.

For detailed examples, see EXAMPLES.md. For API reference, see REFERENCE.md.

Usage Guidance

This skill appears to do what it says: it installs a CLI (via npm) and runs browser automation commands. Before installing: (1) verify the npm package name and publisher (@browserbasehq) and review the package repository and maintainers; (2) be cautious installing global npm CLIs (they run code on your machine and may include postinstall scripts); (3) understand that switching to remote Browserbase mode requires giving API keys to a third-party service which will see your browsing activity and may persist sessions/cookies — only provide those keys if you trust the provider and understand its privacy/terms; (4) local mode requires Chrome/Chromium present (ensure that dependency); (5) avoid using remote mode to scrape sites where you lack permission — there are legal and ethical risks; (6) stop the daemon and clear captured network data when finished (use `browse stop` and `browse network clear`).

Capability Analysis

Type: OpenClaw Skill Name: browse-2-0-2 Version: 1.0.0 The skill bundle provides a legitimate and well-documented interface for the `@browserbasehq/browse-cli` tool, enabling browser automation and web scraping. It includes comprehensive instructions in `SKILL.md` and `REFERENCE.md` for navigation, element interaction via accessibility trees, and session management. While the tool utilizes high-risk capabilities such as JavaScript evaluation (`browse eval`), network capture, and environment variable management for API keys, these features are standard for browser automation and are strictly aligned with the stated purpose. No evidence of malicious intent, unauthorized data exfiltration, or harmful prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description, required binary 'browse', and the install (npm package @browserbasehq/browse-cli) align with a CLI browser-automation tool. The SKILL.md repeatedly documents local Chrome vs a Browserbase remote service, which is coherent with the stated capability. Minor note: the skill expects a local Chrome/Chromium for local mode but does not declare the browser binary as a required system dependency.

ℹ Instruction Scope

Instructions limit the agent to running the browse CLI and describe page navigation, snapshotting, clicking, typing, network capture to disk, and optional remote sessions. They do encourage escalation to a remote Browserbase service for bypassing bot protection and CAPTCHA solving — this is within the skill's browsing/scraping scope but increases privacy/abuse risk (e.g., scraping protected sites). The instructions do not ask the agent to read unrelated files or environment variables beyond the optional Browserbase keys.

ℹ Install Mechanism

Install uses an npm package (@browserbasehq/browse-cli) which will create the browse binary — this is a standard, traceable mechanism but carries the usual npm risk surface (post-install scripts, arbitrary code executed at install/run). No downloads from untrusted URLs or archive extracts are present.

✓ Credentials

No required environment variables are declared. The SKILL.md documents two optional env vars (BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID) that are directly relevant to enabling remote Browserbase sessions; requiring those keys would be proportionate for remote operation. Be aware that supplying those keys hands browsing activity and session persistence to the third‑party service.

ℹ Persistence & Privilege

The skill does not request 'always: true' and is user-invocable. The browse CLI runs a background daemon that persists across commands and (if remote) the Browserbase service offers session persistence — this is expected for a browser tool but means cookies/session state and captured network files may persist until you explicitly stop/clear them.

Version History

v1.0.0

Initial release of the "browser" skill for natural language browser automation via CLI: - Automates web browser interactions for browsing, navigation, data extraction, screenshots, form-filling, and more. - Supports both local Chrome and remote Browserbase sessions with anti-bot stealth, CAPTCHA solving, and residential proxies. - Provides a full command set for page interaction, session management, and state inspection using the `browse` CLI. - Includes best practices, troubleshooting tips, and clear guidance on when to select local vs remote mode. - Ideal for scraping protected sites, bypassing bot detection, and interacting with JavaScript-heavy pages.

Metadata

Slug browse-2-0-2

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Browse 2.0.2?

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from w... It is an AI Agent Skill for Claude Code / OpenClaw, with 191 downloads so far.

How do I install Browse 2.0.2?

Run "/install browse-2-0-2" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Browse 2.0.2 free?

Yes, Browse 2.0.2 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Browse 2.0.2 support?

Browse 2.0.2 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Browse 2.0.2?

It is built and maintained by Othniel su (@radical7vii); the current version is v1.0.0.

More Skills

Browse 2.0.2