Description

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured co...

README (SKILL.md)

\r \r

Browser Automation with agent-browser\r

Name: Agent Browser - 浏览器自动化
Author: cp3d1455926-svg

\r

Installation\r

\r

npm recommended\r

\r

npm install -g agent-browser\r
agent-browser install\r
agent-browser install --with-deps\r
```\r
\r
### From Source\r
\r
```bash\r
git clone https://github.com/vercel-labs/agent-browser\r
cd agent-browser\r
pnpm install\r
pnpm build\r
agent-browser install\r
```\r
\r
## Quick start\r
\r
```bash\r
agent-browser open \x3Curl>        # Navigate to page\r
agent-browser snapshot -i       # Get interactive elements with refs\r
agent-browser click @e1         # Click element by ref\r
agent-browser fill @e2 "text"   # Fill input by ref\r
agent-browser close             # Close browser\r
```\r
\r
## Core workflow\r
\r
1. Navigate: `agent-browser open \x3Curl>`\r
2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)\r
3. Interact using refs from the snapshot\r
4. Re-snapshot after navigation or significant DOM changes\r
\r
## Commands\r
\r
### Navigation\r
\r
```bash\r
agent-browser open \x3Curl>      # Navigate to URL\r
agent-browser back            # Go back\r
agent-browser forward         # Go forward\r
agent-browser reload          # Reload page\r
agent-browser close           # Close browser\r
```\r
\r
### Snapshot (page analysis)\r
\r
```bash\r
agent-browser snapshot            # Full accessibility tree\r
agent-browser snapshot -i         # Interactive elements only (recommended)\r
agent-browser snapshot -c         # Compact output\r
agent-browser snapshot -d 3       # Limit depth to 3\r
agent-browser snapshot -s "#main" # Scope to CSS selector\r
```\r
\r
### Interactions (use @refs from snapshot)\r
\r
```bash\r
agent-browser click @e1           # Click\r
agent-browser dblclick @e1        # Double-click\r
agent-browser focus @e1           # Focus element\r
agent-browser fill @e2 "text"     # Clear and type\r
agent-browser type @e2 "text"     # Type without clearing\r
agent-browser press Enter         # Press key\r
agent-browser press Control+a     # Key combination\r
agent-browser keydown Shift       # Hold key down\r
agent-browser keyup Shift         # Release key\r
agent-browser hover @e1           # Hover\r
agent-browser check @e1           # Check checkbox\r
agent-browser uncheck @e1         # Uncheck checkbox\r
agent-browser select @e1 "value"  # Select dropdown\r
agent-browser scroll down 500     # Scroll page\r
agent-browser scrollintoview @e1  # Scroll element into view\r
agent-browser drag @e1 @e2        # Drag and drop\r
agent-browser upload @e1 file.pdf # Upload files\r
```\r
\r
### Get information\r
\r
```bash\r
agent-browser get text @e1        # Get element text\r
agent-browser get html @e1        # Get innerHTML\r
agent-browser get value @e1       # Get input value\r
agent-browser get attr @e1 href   # Get attribute\r
agent-browser get title           # Get page title\r
agent-browser get url             # Get current URL\r
agent-browser get count ".item"   # Count matching elements\r
agent-browser get box @e1         # Get bounding box\r
```\r
\r
### Check state\r
\r
```bash\r
agent-browser is visible @e1      # Check if visible\r
agent-browser is enabled @e1      # Check if enabled\r
agent-browser is checked @e1      # Check if checked\r
```\r
\r
### Screenshots & PDF\r
\r
```bash\r
agent-browser screenshot          # Screenshot to stdout\r
agent-browser screenshot path.png # Save to file\r
agent-browser screenshot --full   # Full page\r
agent-browser pdf output.pdf      # Save as PDF\r
```\r
\r
### Video recording\r
\r
```bash\r
agent-browser record start ./demo.webm    # Start recording (uses current URL + state)\r
agent-browser click @e1                   # Perform actions\r
agent-browser record stop                 # Stop and save video\r
agent-browser record restart ./take2.webm # Stop current + start new recording\r
```\r
\r
Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.\r
\r
### Wait\r
\r
```bash\r
agent-browser wait @e1                     # Wait for element\r
agent-browser wait 2000                    # Wait milliseconds\r
agent-browser wait --text "Success"        # Wait for text\r
agent-browser wait --url "/dashboard"    # Wait for URL pattern\r
agent-browser wait --load networkidle      # Wait for network idle\r
agent-browser wait --fn "window.ready"     # Wait for JS condition\r
```\r
\r
### Mouse control\r
\r
```bash\r
agent-browser mouse move 100 200      # Move mouse\r
agent-browser mouse down left         # Press button\r
agent-browser mouse up left           # Release button\r
agent-browser mouse wheel 100         # Scroll wheel\r
```\r
\r
### Semantic locators (alternative to refs)\r
\r
```bash\r
agent-browser find role button click --name "Submit"\r
agent-browser find text "Sign In" click\r
agent-browser find label "Email" fill "[email protected]"\r
agent-browser find first ".item" click\r
agent-browser find nth 2 "a" text\r
```\r
\r
### Browser settings\r
\r
```bash\r
agent-browser set viewport 1920 1080      # Set viewport size\r
agent-browser set device "iPhone 14"      # Emulate device\r
agent-browser set geo 37.7749 -122.4194   # Set geolocation\r
agent-browser set offline on              # Toggle offline mode\r
agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers\r
agent-browser set credentials user pass   # HTTP basic auth\r
agent-browser set media dark              # Emulate color scheme\r
```\r
\r
### Cookies & Storage\r
\r
```bash\r
agent-browser cookies                     # Get all cookies\r
agent-browser cookies set name value      # Set cookie\r
agent-browser cookies clear               # Clear cookies\r
agent-browser storage local               # Get all localStorage\r
agent-browser storage local key           # Get specific key\r
agent-browser storage local set k v       # Set value\r
agent-browser storage local clear         # Clear all\r
```\r
\r
### Network\r
\r
```bash\r
agent-browser network route \x3Curl>              # Intercept requests\r
agent-browser network route \x3Curl> --abort      # Block requests\r
agent-browser network route \x3Curl> --body '{}'  # Mock response\r
agent-browser network unroute [url]            # Remove routes\r
agent-browser network requests                 # View tracked requests\r
agent-browser network requests --filter api    # Filter requests\r
```\r
\r
### Tabs & Windows\r
\r
```bash\r
agent-browser tab                 # List tabs\r
agent-browser tab new [url]       # New tab\r
agent-browser tab 2               # Switch to tab\r
agent-browser tab close           # Close tab\r
agent-browser window new          # New window\r
```\r
\r
### Frames\r
\r
```bash\r
agent-browser frame "#iframe"     # Switch to iframe\r
agent-browser frame main          # Back to main frame\r
```\r
\r
### Dialogs\r
\r
```bash\r
agent-browser dialog accept [text]  # Accept dialog\r
agent-browser dialog dismiss        # Dismiss dialog\r
```\r
\r
### JavaScript\r
\r
```bash\r
agent-browser eval "document.title"   # Run JavaScript\r
```\r
\r
### State management\r
\r
```bash\r
agent-browser state save auth.json    # Save session state\r
agent-browser state load auth.json    # Load saved state\r
```\r
\r
## Example: Form submission\r
\r
```bash\r
agent-browser open https://example.com/form\r
agent-browser snapshot -i\r
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]\r
\r
agent-browser fill @e1 "[email protected]"\r
agent-browser fill @e2 "password123"\r
agent-browser click @e3\r
agent-browser wait --load networkidle\r
agent-browser snapshot -i  # Check result\r
```\r
\r
## Example: Authentication with saved state\r
\r
```bash\r
# Login once\r
agent-browser open https://app.example.com/login\r
agent-browser snapshot -i\r
agent-browser fill @e1 "username"\r
agent-browser fill @e2 "password"\r
agent-browser click @e3\r
agent-browser wait --url "/dashboard"\r
agent-browser state save auth.json\r
\r
# Later sessions: load saved state\r
agent-browser state load auth.json\r
agent-browser open https://app.example.com/dashboard\r
```\r
\r
## Sessions (parallel browsers)\r
\r
```bash\r
agent-browser --session test1 open site-a.com\r
agent-browser --session test2 open site-b.com\r
agent-browser session list\r
```\r
\r
## JSON output (for parsing)\r
\r
Add `--json` for machine-readable output:\r
\r
```bash\r
agent-browser snapshot -i --json\r
agent-browser get text @e1 --json\r
```\r
\r
## Debugging\r
\r
```bash\r
agent-browser open example.com --headed              # Show browser window\r
agent-browser console                                # View console messages\r
agent-browser console --clear                        # Clear console\r
agent-browser errors                                 # View page errors\r
agent-browser errors --clear                         # Clear errors\r
agent-browser highlight @e1                          # Highlight element\r
agent-browser trace start                            # Start recording trace\r
agent-browser trace stop trace.zip                   # Stop and save trace\r
agent-browser record start ./debug.webm              # Record from current page\r
agent-browser record stop                            # Save recording\r
agent-browser --cdp 9222 snapshot                    # Connect via CDP\r
```\r
\r
## Troubleshooting\r
\r
- If the command is not found on Linux ARM64, use the full path in the bin folder.\r
- If an element is not found, use snapshot to find the correct ref.\r
- If the page is not loaded, add a wait command after navigation.\r
- Use --headed to see the browser window for debugging.\r
\r
## Options\r
\r
- --session \x3Cname> uses an isolated session.\r
- --json provides JSON output.\r
- --full takes a full page screenshot.\r
- --headed shows the browser window.\r
- --timeout sets the command timeout in milliseconds.\r
- --cdp \x3Cport> connects via Chrome DevTools Protocol.\r
\r
## Notes\r
\r
- Refs are stable per page load but change on navigation.\r
- Always snapshot after navigation to get new refs.\r
- Use fill instead of type for input fields to ensure existing text is cleared.\r
\r
## Reporting Issues\r
\r
- Skill issues: Open an issue at https://github.com/TheSethRose/Agent-Browser-CLI\r
- agent-browser CLI issues: Open an issue at https://github.com/vercel-labs/agent-browser\r

Usage Guidance

This skill appears to be a normal browser-automation wrapper, but do not blindly run the recommended global npm install or git clone without verifying the upstream package/repository. Before installing or running it: 1) confirm the exact npm package owner and inspect the package on npm (who published it, version, and files); 2) verify the authoritative GitHub repo (the SKILL.md and README reference different orgs); 3) prefer running in an isolated environment (container/VM) until you trust the package; 4) be cautious about allowing the skill to access private sites, cookies, or local files (commands like upload or screenshot can expose sensitive data); and 5) ask the skill author for a canonical homepage/repo and signed/verified releases — if they cannot provide a clear source, treat the npm global install recommendation as risky.

Capability Assessment

✓ Purpose & Capability

Name/description (headless browser CLI) align with the SKILL.md commands and required binaries (node, npm). Requiring node/npm is reasonable for an npm-published CLI fallback; the Rust-based source path is optional and reasonable as an alternative build path.

ℹ Instruction Scope

SKILL.md stays within browser automation scope (open, snapshot, click, fill, upload, screenshot, record). It does not instruct reading unrelated host config or secrets, but it does include commands that interact with local files (upload <file>, screenshot output to file) and preserves cookies/storage — expected for a browser tool but a potential data-exfil/exposure vector if misused. The skill allows navigating arbitrary URLs, which can access internal resources if the agent runs in a privileged environment.

⚠ Install Mechanism

This is instruction-only (no install spec), which reduces automatic install risk, but SKILL.md recommends 'npm install -g agent-browser' and also gives two differing source repos (git clone https://github.com/vercel-labs/agent-browser in SKILL.md vs README suggesting https://github.com/openclaw/agent-browser and elsewhere 'agent-browser' npm). The lack of a single authoritative source and the recommendation to perform a global npm install are inconsistent and increase risk — you should verify the exact npm package and repository before installing.

ℹ Credentials

The skill declares no environment variables or credentials (appropriate). However, runtime commands can read/write local files (upload, screenshot, record), preserve cookies/storage, and set HTTP basic auth via commands — all legitimate for a browser tool but potentially sensitive if the agent is given access to private files, cookies, or internal sites.

✓ Persistence & Privilege

Skill does not request always:true and has no install-time hooks or claimed persistent system changes. It's user-invocable and allows autonomous model invocation (platform default) — not a unique escalation of privilege.

Version History

v1.0.0

Initial release

Metadata

Slug agent-browser-tool

Version 1.0.0

License MIT-0

All-time Installs 4

Active Installs 3

Total Versions 1

Frequently Asked Questions

What is Agent Browser - 浏览器自动化?

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured co... It is an AI Agent Skill for Claude Code / OpenClaw, with 871 downloads so far.

How do I install Agent Browser - 浏览器自动化?

Run "/install agent-browser-tool" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Browser - 浏览器自动化 free?

Yes, Agent Browser - 浏览器自动化 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Agent Browser - 浏览器自动化 support?

Agent Browser - 浏览器自动化 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Browser - 浏览器自动化?

It is built and maintained by cp3d1455926-svg (@cp3d1455926-svg); the current version is v1.0.0.

More Skills

Agent Browser - 浏览器自动化