Description

Browser automation CLI for AI agents using Google Chrome via CDP. Connects to a running Chrome instance started by chrome_for_openclaw.sh inside an XRDP sess...

README (SKILL.md)

Chrome Browser Skill (CDP Mode)

Name: Chrome CDP for OpenClaw
Author: joustonhuang

Read first: Complete setup guide, Chrome launcher script, and all reference docs at https://github.com/joustonhuang/chrome_for_openclaw

Browser automation via agent-browser connected to a real, running Google Chrome instance over CDP. All commands connect to Chrome at localhost:9222 — no separate Chromium download needed. Your existing Chrome login sessions (Gmail, GitHub, etc.) are immediately available.

Also install this skill in OpenClaw: openclaw-agent-browser-clawdbot — the original agent-browser skill that this one is based on and compatible with.

Prerequisites

Everything is handled by a single script: chrome_for_openclaw.sh.

Step 1 — One-time system setup (run once, requires sudo)

Installs Google Chrome if missing, then configures XRDP + XFCE. OpenClaw can run this step itself:

bash \x3C(curl -fsSL https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh) --install

Other options:

# Reinstall everything from scratch
bash \x3C(curl -fsSL https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh) --reinstall

# Undo all changes made by --install
bash \x3C(curl -fsSL https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh) --uninstall

After --install completes, log out and reconnect via RDP to get a fresh XFCE session.

Step 2 — Start Chrome (run when Chrome is not already running)

Launch Chrome in CDP debug mode. Only needed if Chrome is not already open. OpenClaw can run this step itself:

bash \x3C(curl -fsSL https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh)

Optional environment variable overrides:

DEBUG_PORT=9222 START_URL=https://gmail.com \
bash \x3C(curl -fsSL https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh)

This kills any existing Chrome processes, starts Google Chrome with --remote-debugging-port=9222, and waits for the CDP endpoint to respond.

Step 3 — Install agent-browser

WARNING: Do NOT install agent-browser by cloning or building from the GitHub repo (https://github.com/vercel-labs/agent-browser). Doing so is known to break XRDP on Debian/Ubuntu. Install via npm only:

npm install -g agent-browser
# Do NOT run `agent-browser install` — we connect to Chrome via CDP, no Chromium needed.

CDP Connection

Every command uses --cdp 9222 to connect to the running Chrome:

agent-browser --cdp 9222 \x3Ccommand>

To avoid repeating it, export once for the session:

export AGENT_BROWSER_CDP_URL=http://127.0.0.1:9222
agent-browser \x3Ccommand>   # automatically connects to Chrome

Auto-discovery also works if you are unsure of the port:

agent-browser --auto-connect \x3Ccommand>

Core Workflow

Every browser automation follows this pattern:

Navigate: agent-browser --cdp 9222 open \x3Curl>
Snapshot: agent-browser --cdp 9222 snapshot -i (get element refs like @e1, @e2)
Interact: Use refs to click, fill, select
Re-snapshot: After navigation or DOM changes, get fresh refs

export AGENT_BROWSER_CDP_URL=http://127.0.0.1:9222

agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i  # Check result

Command Chaining

Commands can be chained with &&. The browser persists between commands via a background daemon.

agent-browser --cdp 9222 open https://example.com && agent-browser --cdp 9222 snapshot -i
agent-browser --cdp 9222 fill @e1 "[email protected]" && agent-browser --cdp 9222 fill @e2 "password123" && agent-browser --cdp 9222 click @e3

When to chain: Use && when you don't need to read intermediate output before proceeding. Run commands separately when you need to parse snapshot output to discover refs first.

Handling Authentication

Because Chrome is already running with your login sessions, you are already authenticated on sites you have previously logged into. No login flow needed in most cases.

Option 1: Use the existing Chrome session (default — zero setup)

# Chrome already has your Gmail/GitHub/etc. cookies — just open and automate
agent-browser --cdp 9222 open https://mail.google.com/mail/u/0/#inbox
agent-browser --cdp 9222 snapshot -i

Option 2: Save session state for use outside CDP

# Export auth from the running Chrome
agent-browser --cdp 9222 state save ./auth.json
# Use it later in a standalone agent-browser session
agent-browser --state ./auth.json open https://app.example.com/dashboard

Option 3: Chrome profile reuse

agent-browser --cdp 9222 profiles  # List profiles in the running Chrome

Option 4: Auth vault (for recurring credentials)

echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser --cdp 9222 auth login myapp

See references/authentication.md for OAuth, 2FA, and cookie patterns.

Essential Commands

# Batch: ALWAYS use batch for 2+ sequential commands
agent-browser --cdp 9222 batch "open https://example.com" "snapshot -i"
agent-browser --cdp 9222 batch "open https://example.com" "screenshot"
agent-browser --cdp 9222 batch "click @e1" "wait 1000" "screenshot"

# Navigation
agent-browser --cdp 9222 open \x3Curl>              # Navigate (aliases: goto, navigate)
agent-browser --cdp 9222 close                   # Close browser
agent-browser --cdp 9222 close --all             # Close all active sessions

# Snapshot
agent-browser --cdp 9222 snapshot -i             # Interactive elements with refs (recommended)
agent-browser --cdp 9222 snapshot -i --urls      # Include href URLs for links
agent-browser --cdp 9222 snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser --cdp 9222 click @e1               # Click element
agent-browser --cdp 9222 click @e1 --new-tab     # Click and open in new tab
agent-browser --cdp 9222 fill @e2 "text"         # Clear and type text
agent-browser --cdp 9222 type @e2 "text"         # Type without clearing
agent-browser --cdp 9222 select @e1 "option"     # Select dropdown option
agent-browser --cdp 9222 check @e1               # Check checkbox
agent-browser --cdp 9222 press Enter             # Press key
agent-browser --cdp 9222 keyboard type "text"    # Type at current focus (no selector)
agent-browser --cdp 9222 scroll down 500         # Scroll page

# Get information
agent-browser --cdp 9222 get text @e1            # Get element text
agent-browser --cdp 9222 get url                 # Get current URL
agent-browser --cdp 9222 get title               # Get page title
agent-browser --cdp 9222 get cdp-url             # Get CDP WebSocket URL

# Wait
agent-browser --cdp 9222 wait @e1                # Wait for element
agent-browser --cdp 9222 wait 2000               # Wait milliseconds
agent-browser --cdp 9222 wait --url "**/page"    # Wait for URL pattern
agent-browser --cdp 9222 wait --text "Welcome"   # Wait for text to appear
agent-browser --cdp 9222 wait --fn "!document.body.innerText.includes('Loading...')"

# Downloads
agent-browser --cdp 9222 download @e1 ./file.pdf
agent-browser --cdp 9222 wait --download ./output.zip

# Tab management
agent-browser --cdp 9222 tab list
agent-browser --cdp 9222 tab new https://example.com
agent-browser --cdp 9222 tab 2
agent-browser --cdp 9222 tab close

# Network
agent-browser --cdp 9222 network requests
agent-browser --cdp 9222 network requests --type xhr,fetch
agent-browser --cdp 9222 network route "**/ads/*" --abort

# Capture
agent-browser --cdp 9222 screenshot
agent-browser --cdp 9222 screenshot --full
agent-browser --cdp 9222 screenshot --annotate
agent-browser --cdp 9222 pdf output.pdf

Batch Execution

ALWAYS use batch when running 2+ commands in sequence:

# Navigate and snapshot
agent-browser --cdp 9222 batch "open https://example.com" "snapshot -i"

# Navigate, snapshot, and screenshot
agent-browser --cdp 9222 batch "open https://example.com" "snapshot -i" "screenshot"

# Click, wait, screenshot
agent-browser --cdp 9222 batch "click @e1" "wait 1000" "screenshot"

# With --bail to stop on first error
agent-browser --cdp 9222 batch --bail "open https://example.com" "click @e1" "screenshot"

Only use a single command (not batch) when you need to read the output before deciding the next step — e.g., after snapshot -i to discover refs. After reading, batch the remaining steps.

Efficiency Strategies

Use --urls to avoid re-navigation. Get all href URLs from one snapshot, then open each directly instead of clicking refs and navigating back.

Snapshot once, act many times. Extract all refs from a single snapshot, then batch actions.

Multi-page workflow:

# 1. Get all URLs in one call
agent-browser --cdp 9222 batch "open https://news.ycombinator.com" "snapshot -i --urls"
# Read output to extract URLs, then visit each directly:
agent-browser --cdp 9222 batch "open https://github.com/example/repo" "screenshot"
agent-browser --cdp 9222 batch "open https://example.com/article" "screenshot"

Common Patterns

Form Submission

agent-browser --cdp 9222 batch "open https://example.com/signup" "snapshot -i"
# Read snapshot to identify form refs, then fill and submit
agent-browser --cdp 9222 batch "fill @e1 \"Jane Doe\"" "fill @e2 \"[email protected]\"" "click @e3" "wait 2000"

Gmail Workflow (already logged in via Chrome)

export AGENT_BROWSER_CDP_URL=http://127.0.0.1:9222

agent-browser batch "open https://mail.google.com/mail/u/0/#inbox" "wait 2000" "snapshot -i"
# Agent identifies Compose button ref @e4
agent-browser batch "click @e4" "wait 1000" "snapshot -i"
# Agent fills To/Subject/Body
agent-browser batch \
  "fill @e5 \"[email protected]\"" \
  "fill @e6 \"Subject\"" \
  "fill @e7 \"Body text\"" \
  "click @e8"

Save Auth State from Running Chrome

# Export your current Chrome login cookies for later reuse
agent-browser --cdp 9222 state save ./auth.json
# Reuse without CDP (standalone session)
agent-browser state load ./auth.json
agent-browser open https://app.example.com/dashboard

Working with Iframes

Iframe content is automatically inlined in snapshots. Refs inside iframes work directly.

agent-browser --cdp 9222 batch "open https://example.com/checkout" "snapshot -i"
# @e2 [Iframe] "payment-frame"
#   @e3 [input] "Card number"
#   @e4 [button] "Pay"
agent-browser --cdp 9222 batch "fill @e3 \"4111111111111111\"" "click @e4"

Parallel Sessions

agent-browser --cdp 9222 --session site1 open https://site-a.com
agent-browser --cdp 9222 --session site2 open https://site-b.com

agent-browser --cdp 9222 --session site1 snapshot -i
agent-browser --cdp 9222 --session site2 snapshot -i

agent-browser --cdp 9222 session list

Annotated Screenshots (Vision Mode)

agent-browser --cdp 9222 screenshot --annotate
# Output: image path + legend:
#   [1] @e1 button "Submit"
#   [2] @e2 link "Home"
agent-browser --cdp 9222 click @e2

Semantic Locators (Alternative to Refs)

agent-browser --cdp 9222 find text "Sign In" click
agent-browser --cdp 9222 find label "Email" fill "[email protected]"
agent-browser --cdp 9222 find role button click --name "Submit"
agent-browser --cdp 9222 find placeholder "Search" type "query"
agent-browser --cdp 9222 find testid "submit-btn" click

JavaScript Evaluation

agent-browser --cdp 9222 eval 'document.title'

# Complex JS — use --stdin to avoid shell escaping issues
agent-browser --cdp 9222 eval --stdin \x3C\x3C'EVALEOF'
JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href))
EVALEOF

Streaming

agent-browser --cdp 9222 stream enable           # Start WebSocket stream
agent-browser --cdp 9222 stream enable --port 9223
agent-browser --cdp 9222 stream status
agent-browser --cdp 9222 stream disable

Timeouts and Slow Pages

Default timeout: 25 seconds. Override with AGENT_BROWSER_DEFAULT_TIMEOUT (milliseconds).

open already waits for the page load event. Only add explicit waits for async content.

agent-browser --cdp 9222 wait "#content"        # Wait for element (preferred)
agent-browser --cdp 9222 wait 2000              # Fixed duration for slow SPAs
agent-browser --cdp 9222 wait --url "**/dashboard"
agent-browser --cdp 9222 wait --text "Results loaded"

Avoid wait --load networkidle — ad-heavy or websocket sites will hang. Use wait 2000 or wait \x3Cselector> instead.

Session Cleanup

agent-browser --cdp 9222 close --all            # Close all sessions
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser --cdp 9222 open example.com

Security

# Wrap page content to protect against prompt injection
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
agent-browser --cdp 9222 snapshot

# Restrict navigation to trusted domains
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"

# Gate destructive actions via policy file
export AGENT_BROWSER_ACTION_POLICY=./policy.json

Observability Dashboard

agent-browser --cdp 9222 dashboard start   # Start live dashboard on port 4848
agent-browser --cdp 9222 dashboard stop

Ref Lifecycle (Important)

Refs (@e1, @e2, …) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that navigate
Form submissions
Dynamic content loading (dropdowns, modals)

agent-browser --cdp 9222 click @e5       # Navigates
agent-browser --cdp 9222 snapshot -i     # MUST re-snapshot
agent-browser --cdp 9222 click @e1       # Use new refs

Installation Summary

# Step 1: One-time system setup (Chrome + XRDP + XFCE, requires sudo)
bash \x3C(curl -fsSL https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh) --install
# → Log out and reconnect via RDP after this completes

# Step 2: Launch Chrome in CDP debug mode (only if not already running)
bash \x3C(curl -fsSL https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh)

# Step 3: Install agent-browser via npm ONLY
# WARNING: Do NOT clone/build from https://github.com/vercel-labs/agent-browser
#          — it breaks XRDP on Debian/Ubuntu. Use npm only:
npm install -g agent-browser

# Step 4: Connect and automate
export AGENT_BROWSER_CDP_URL=http://127.0.0.1:9222
agent-browser batch "open https://example.com" "snapshot -i"

Deep-Dive Reference Docs

Reference	When to Use
references/commands.md	Full command reference with all options
references/snapshot-refs.md	Ref lifecycle, invalidation, troubleshooting
references/session-management.md	Parallel sessions, state persistence
references/authentication.md	Login flows, OAuth, 2FA, state reuse
references/video-recording.md	Recording workflows for debugging
references/profiling.md	Chrome DevTools performance profiling
references/proxy-support.md	Proxy configuration, geo-testing

Credits

Chrome launcher script: joustonhuang/chrome_for_openclaw

agent-browser CLI: vercel-labs/agent-browser (Apache 2.0)

Usage Guidance

This skill appears to do what it says (control a real Chrome via the DevTools protocol), but it asks you to run a remote installer as root and to expose Chrome's remote debugging port, which gives any local process full access to your browser sessions (cookies, localStorage, ability to execute JS). Before installing or running anything: 1) Review the entire install script (https://raw.githubusercontent.com/joustonhuang/chrome_for_openclaw/main/chrome_for_openclaw.sh) line-by-line — do not run curl|bash without inspection. 2) Prefer running the installer in an isolated VM or throwaway container, not on your primary machine. 3) Understand that connecting to localhost:9222 allows reading and controlling logged-in sessions; only use on trusted hosts and close Chrome when done. 4) Avoid saving session state files in source control, and use AGENT_BROWSER_ENCRYPTION_KEY if you must persist state. 5) If you need least privilege, consider alternatives that use ephemeral profiles or cloud browser providers rather than reusing your personal Chrome profile. If you want help auditing the install script or extracting specific risky commands to review, provide the script and I can analyze it line-by-line.

Capability Analysis

Type: OpenClaw Skill Name: chrome-for-openclaw Version: 1.0.0 The skill bundle provides browser automation capabilities by interfacing with a local Chrome instance via CDP. The most significant indicator is the instruction in SKILL.md for the agent to execute a remote shell script with sudo privileges using a 'curl | bash' pattern (targeting chrome_for_openclaw.sh on GitHub). While intended for environment setup (XRDP/Chrome), this is a high-risk execution pattern. Additionally, the documentation in references/commands.md and references/authentication.md details capabilities for arbitrary JavaScript execution (eval) and the extraction of sensitive session data (cookies and localStorage). These features, while functional for the stated purpose, represent a substantial attack surface for data exfiltration and remote code execution.

Capability Tags

cryptocan-make-purchasesrequires-oauth-token

Capability Assessment

ℹ Purpose & Capability

The name/description match the instructions: the skill uses agent-browser to connect to a real Chrome instance via CDP to reuse existing login sessions. However the registry metadata lists no required binaries/envs while SKILL.md and its internal metadata explicitly require the agent-browser command and mention many runtime environment variables (AGENT_BROWSER_CDP_URL, AGENT_BROWSER_PROFILE, AGENT_BROWSER_ENCRYPTION_KEY, HTTP(S)_PROXY, etc.). That inconsistency is a minor coherence issue but not necessarily malicious.

⚠ Instruction Scope

The runtime instructions tell the agent (or user) to fetch and execute a remote script (bash <(curl -fsSL https://raw.githubusercontent.com/.../chrome_for_openclaw.sh)), to run it with sudo (--install), to kill existing Chrome processes, and to start Chrome with --remote-debugging-port exposing full browser control on localhost. Those steps are directly related to the skill's purpose but they also grant the skill (and any local process that can reach localhost:9222) access to all cookies, storage, and the ability to execute JS in pages — a high-privilege action. The docs also suggest saving/restoring session state in plaintext files (auth.json) and encourage using env vars for credentials, which is sensitive and must be carefully handled.

⚠ Install Mechanism

There is no formal install spec in the registry; instead SKILL.md instructs executing a remote install script via curl|bash from raw.githubusercontent.com and installing agent-browser globally via npm -g. Running an arbitrary script fetched from a GitHub raw URL as root is a meaningful risk (it modifies system XRDP/XFCE configuration and Chrome behavior). While GitHub raw content is a common host, executing it directly with sudo is high-risk and should be reviewed manually before running.

ℹ Credentials

The skill declares no required environment variables in the registry but the instructions reference and recommend many env vars (DEBUG_PORT, START_URL, AGENT_BROWSER_CDP_URL, AGENT_BROWSER_PROFILE, AGENT_BROWSER_ENCRYPTION_KEY, HTTP_PROXY/HTTPS_PROXY/NO_PROXY, etc.). These are relevant to the declared functionality (connecting to CDP, profiles, proxies), but they are sensitive (especially AGENT_BROWSER_ENCRYPTION_KEY and proxy credentials). The guidance to save state files (which contain session tokens in plaintext) increases the risk if handled carelessly.

⚠ Persistence & Privilege

The skill does not request 'always: true', but the one-time install step requires sudo and modifies system components (installs Chrome if missing, configures XRDP + XFCE, and changes how Chrome is launched). It also suggests global npm -g install. These actions create persistent system changes and elevated privilege usage which are more intrusive than a lightweight instruction-only skill and should be performed only after manual review or within an isolated VM.

Version History

v1.0.0

Initial publish for Chrome CDP bootstrap and browser automation workflow.

Metadata

Slug chrome-for-openclaw

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Chrome CDP for OpenClaw?

Browser automation CLI for AI agents using Google Chrome via CDP. Connects to a running Chrome instance started by chrome_for_openclaw.sh inside an XRDP sess... It is an AI Agent Skill for Claude Code / OpenClaw, with 95 downloads so far.

How do I install Chrome CDP for OpenClaw?

Run "/install chrome-for-openclaw" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Chrome CDP for OpenClaw free?

Yes, Chrome CDP for OpenClaw is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Chrome CDP for OpenClaw support?

Chrome CDP for OpenClaw is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Chrome CDP for OpenClaw?

It is built and maintained by joustonhuang (@joustonhuang); the current version is v1.0.0.

More Skills

Chrome CDP for OpenClaw