功能描述

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking...

使用说明 (SKILL.md)

Browse — Browser Automation for Agents

Name: Browse
Author: danjdewhurst

How it works

browse is a CLI that wraps Playwright behind a persistent daemon on a Unix socket. The daemon cold-starts in ~3s on first use, then every command runs in sub-200ms. Session state (cookies, localStorage, auth tokens) persists across commands within a session.

All output is plain text. Objects are JSON-stringified. Commands return non-zero on failure with an error message.

Important constraints:

Commands are sequential — do not run multiple browse commands in parallel. The daemon handles one command at a time.
Run browse help for the full command list, or browse help \x3Ccommand> for detailed usage and flags.

The ref system — read this first

Refs (@e1, @e2, ...) are how you target elements. They replace CSS selectors for most interactions.

Rules:

Always browse snapshot before interacting. Refs only exist after a snapshot.
Refs are ephemeral. Every snapshot call regenerates them. Old refs are invalid.
Refs go stale after navigation. Any goto or click that changes the page invalidates refs. You'll get a clear error — just browse snapshot again.

Core interaction loop:

browse snapshot              # see what's on the page — get refs
browse fill @e3 "test"       # fill the search field
browse click @e4             # click a button
browse snapshot              # re-snapshot after the page changes

Workflow

The standard pattern for any browser task:

Navigate: browse goto \x3Curl>
Observe: browse snapshot for page structure (interactive elements with refs). Use browse snapshot -i to include structural elements (headings, text), or -f for the full accessibility tree.
Check for errors: browse console --level error after navigation.
Interact: browse fill @eN "value", browse click @eN, browse hover @eN, browse press Tab, browse select @eN "option", browse scroll @eN (scroll into view).
- Use browse press \x3Ckey> for keyboard navigation (Tab, Escape, Enter, ArrowDown, Shift+Tab, etc.). Multiple keys: browse press Tab Tab Tab.
- Use browse scroll down/up to page through content, browse scroll top/bottom to jump to extremes.
- After clicks that trigger SPA navigation, use browse wait url /path, browse wait text "Expected", or browse wait visible .selector before snapshotting.
Verify: browse snapshot or browse screenshot after each interaction to confirm the result.
Repeat: Move through pages and flows.

For configured applications, browse healthcheck gives a quick pass/fail across key pages.

Key commands by category

Category	Commands
Navigate	`goto \x3Curl>`, `url`, `back`, `forward`, `reload [--hard]`, `text`, `version`, `quit`, `wipe`
Observe	`snapshot`, `screenshot` (`--diff`, `--threshold`), `console`, `network`
Interact	`click @eN`, `hover @eN [--duration ms]`, `press \x3Ckey> [key ...]`, `fill @eN "value"`, `select @eN "option"`, `upload @eN \x3Cfile> [file ...]`, `attr @eN [attribute]`, `scroll down/up/top/bottom/@eN/x y`, `form --data '{"field":"value"}'`
Wait	`wait url \x3Cstr>`, `wait text \x3Cstr>`, `wait visible \x3Csel>`, `wait hidden \x3Csel>`, `wait network-idle`, `wait \x3Cms>`
Viewport	`viewport`, `goto --viewport/--device/--preset`
Evaluate	`eval \x3Cexpr>` (in-page JS), `page-eval \x3Cexpr>` (Playwright page API)
Auth	`login --env \x3Cname>`, `auth-state save/load \x3Cpath>`
Tabs	`tab list/new/switch/close`
Assert	`assert visible/text-contains/url-contains/...`, `assert-ai "\x3Cvisual assertion>"`
Accessibility	`a11y` (full page), `a11y @eN` (element), `a11y --standard wcag2aa`, `a11y --json`, `a11y coverage`, `a11y tree`, `a11y tab-order`, `a11y headings`
Performance	`perf` (Core Web Vitals), `perf --budget lcp=2500,cls=0.1`, `perf --json`
Security	`security` (headers, cookies, mixed content), `security --json`
Responsive	`responsive` (multi-viewport screenshots), `responsive --breakpoints 320x568,1920x1080`, `responsive --url \x3Curl>`
Extract	`extract table \x3Csel>` (`--csv`, `--json`), `extract links` (`--filter`), `extract meta`, `extract select \x3Csel>` (`--attr`)
Flows	`flow list`, `flow \x3Cname> --var key=value` (`--reporter junit\|json\|markdown`, `--dry-run`, `--stream`, `--webhook \x3Curl>`), `healthcheck` (`--reporter junit\|json\|markdown`, `--parallel`, `--concurrency`, `--webhook \x3Curl>`), `test-matrix --roles r1,r2 --flow \x3Cname>`, `diff --baseline \x3Curl> --current \x3Curl>`
Sessions	`session list/create/close`, `--session \x3Cname>` on any command
Tracing	`trace start` (`--screenshots`, `--snapshots`), `trace stop --out \x3Cpath>`, `trace view [\x3Cpath>] --latest --port \x3Cn>`, `trace list`, `trace status`
Video	`video start [--size WxH]`, `video stop [--out \x3Cpath>]`, `video status`, `video list`
Crawl	`crawl \x3Curl>` (`--depth`, `--extract table\|links\|meta\|text`, `--paginate`, `--rate-limit`, `--output`, `--dry-run`)
Record	`record start` (`--output`, `--name`), `record stop`, `record pause/resume`
Network Sim	`throttle \x3Cpreset\|off\|status>` (slow-3g, 3g, 4g, wifi, cable), `offline on/off`
NL Commands	`do "\x3Cinstruction>"` (`--dry-run`, `--provider`, `--model`)
VRT	`vrt init`, `vrt baseline`, `vrt check` (`--threshold`), `vrt update` (`--all`), `vrt list`
SEO	`seo [url]` (`--check`, `--score`, `--json`)
Compliance	`compliance [url]` (`--standard gdpr\|ccpa\|eprivacy`, `--json`)
Security Scan	`security-scan` (`--checks xss,csp,clickjack,forms`, `--verbose`, `--json`)
i18n	`i18n --locales en,fr,de --url \x3Curl>`, `i18n check-keys`, `i18n rtl-check`
API Assert	`api-assert \x3Curl-pattern>` (`--status`, `--timing`, `--schema`, `--body-contains`, `--header`)
Design	`design-audit --tokens \x3Cfile>`, `design-audit --extract`
Doc Capture	`doc-capture --flow \x3Cfile> --output \x3Cdir>` (`--markdown`, `--update`)
Gestures	`gesture swipe \x3Cdir>`, `gesture long-press @eN`, `gesture double-tap @eN`, `gesture drag @eN --to @eN`
Devices	`devices list`, `devices search \x3Cquery>`, `devices info \x3Cname>`
Monitor	`monitor check --config \x3Cfile>`, `monitor history`, `monitor status`
Dev Server	`dev start`, `dev stop`, `dev status`
CI/CD	`ci-init` (`--ci github\|gitlab\|circleci`)
Events	`subscribe` (`--events navigation,console,network`, `--level`, `--idle-timeout`)
Watch/REPL	`watch \x3Cflow-file>`, `repl`
Tooling	`init`, `report --out \x3Cpath>`, `replay --out \x3Cpath>`, `flow-share export/import/list/install/publish`, `screenshots list/clean/count`, `completions bash/zsh/fish`, `status [--json] [--watch] [--exit-code]`

Run browse help \x3Ccommand> for flags and detailed usage — don't guess at flags.

Named sessions

Use named sessions to run multiple independent page groups:

browse session create worker-1               # shared context (same cookies/storage)
browse session create worker-2 --isolated    # isolated context (separate cookies/storage)
browse --session worker-1 goto https://a.com
browse --session worker-2 goto https://b.com
browse session list
browse session close worker-1

By default, sessions share the browser context. Use --isolated for fully separate cookies, storage, and permissions.

Authentication

Configured login (preferred — uses browse.config.json):

browse login --env staging

Manual login:

browse goto https://app.example.com/login
browse snapshot
browse fill @e1 "[email protected]"
browse fill @e2 "password123"
browse click @e3
browse snapshot        # verify redirect / dashboard loaded

Session reuse — save after login, load in future sessions:

browse auth-state save /tmp/auth.json
browse auth-state load /tmp/auth.json

Use browse wipe to clear all session data before switching accounts or at the end of a session.

Visual diff

Compare screenshots against a baseline to detect visual regressions:

browse screenshot current.png --diff baseline.png
browse screenshot current.png --diff baseline.png --threshold 5

Output includes similarity percentage, diff pixel count, and a path to the diff image (changed pixels highlighted in red).

Multi-browser

Browse defaults to Chromium. Use --browser to switch:

browse --browser firefox goto https://example.com
browse --browser webkit goto https://example.com
BROWSE_BROWSER=firefox browse goto https://example.com

Stealth features and CDP console capture are Chromium-only; Firefox/WebKit use standard Playwright.

Proxy

Route browser traffic through a proxy:

browse --proxy http://proxy:8080 goto https://example.com
BROWSE_PROXY=socks5://proxy:1080 browse goto https://example.com

Or configure in browse.config.json with "proxy": { "server": "http://proxy:8080", "bypass": "localhost", "username": "u", "password": "p" }.

Playwright passthrough

Pass any Playwright launch or context option via browse.config.json without waiting for explicit browse support:

{
  "playwright": {
    "launchOptions": { "locale": "fr-FR", "timezoneId": "Europe/Paris" },
    "contextOptions": { "colorScheme": "dark", "geolocation": { "latitude": 48.86, "longitude": 2.35 } }
  }
}

launchOptions are applied at browser startup; contextOptions are applied to isolated sessions and video recording contexts. Browse's own options (headless, viewport, stealth) take precedence on conflict.

Headed mode

Launch the browser visibly for debugging (set before the daemon starts):

BROWSE_HEADED=1 browse goto https://example.com

Timeout control

Any command accepts --timeout \x3Cms> (default 30s). Use for slow pages:

browse goto https://slow-page.example.com --timeout 60000

Error recovery

Error	Fix
`"element is outside of the viewport"`	Run `browse scroll @eN` to scroll it into view, then retry
`"Refs are stale"` / `"Unknown ref"`	Run `browse snapshot` to refresh refs
`"Daemon connection lost"`	Re-run the command — CLI auto-restarts the daemon
`"Command timed out after Nms"`	Use `--timeout 60000`, or check the URL
`"Daemon crashed and recovery failed"`	Run `browse quit`, then retry
`"Unknown command"` for a valid command	Stale daemon — run `browse quit`, then retry
`"Unknown flag"`	Check `browse help \x3Ccmd>` for valid flags
Login fails	Check env vars, verify login URL, `browse screenshot` to see the page

安全使用建议

This skill's documentation describes a powerful browser automation CLI that can read/write files, persist session cookies, run arbitrary in-page JS, and send data to webhooks — but the package declares no binary, no install instructions, and no required environment variables. Before installing or enabling: - Ask the publisher: where does the 'browse' binary come from? Request a verified homepage or install instructions and a trusted release URL (GitHub release or official project domain). - Do not enable this skill in environments with sensitive credentials until you confirm its provenance. The tool can persist and access cookies/tokens and can upload files or post to arbitrary webhooks. - If you must test it, run the agent in an isolated sandbox or dedicated VM with restricted network access and no sensitive files or creds mounted. - Require the skill to document exactly what 'login --env <name>' expects and list any env names it will read, and to limit or whitelist webhook targets if possible. - Prefer a skill that includes an explicit, auditable install spec (signed release or known package) and minimal declared environment access. If those are not provided, treat the skill as untrusted.

功能分析

Type: OpenClaw Skill Name: forjd-browse Version: 1.0.0 The 'forjd-browse' skill bundle provides a powerful browser automation interface with high-risk capabilities, including arbitrary JavaScript execution ('eval', 'page-eval'), session state manipulation ('auth-state save/load'), and broad shell access via 'Bash(browse:*)'. While these features are aligned with the stated purpose of web automation and testing in 'SKILL.md', they represent a significant security risk as they allow an agent to handle sensitive authentication data and execute arbitrary code within a browser context. No explicit evidence of malicious intent or data exfiltration was found in the provided documentation.

能力评估

⚠ Purpose & Capability

The SKILL.md documents a full-featured 'browse' CLI (daemon, Playwright wrapper, session state, uploads, traces, webhooks, auth-state handling). Yet the skill declares no required binaries, no install steps, and no primary credential. Either the agent environment must already have a compatible 'browse' binary (not documented), or the skill is missing an install declaration — this mismatch is unexpected for a tool of this complexity.

⚠ Instruction Scope

The instructions command list includes operations that access local files (upload, auth-state save/load, trace out <path>), run arbitrary JS in page context (eval/page-eval), and send data to external endpoints (flow/healthcheck --webhook <url>). The SKILL.md also suggests using 'login --env <name>' and persisting cookies/auth tokens — these are broad actions beyond a simple read-only browser. The doc gives the agent freedom to read/write files and to POST results to arbitrary webhooks, which is high scope for an instruction-only skill without declared constraints.

ℹ Install Mechanism

There is no install specification (instruction-only). That is lowest install risk, but unusual here: the skill assumes a specific CLI 'browse' and a persistent daemon on a Unix socket. The absence of an install step or source URL means there's no guarantee the binary is present or trustworthy; if present, its provenance is unknown.

⚠ Credentials

The skill declares no required environment variables, yet the runtime docs explicitly reference env-based login (login --env <name>) and persistent auth-state (cookies/localStorage/auth tokens). Commands like upload <file> and webhooks allow exfiltration of local files and session data. Declaring zero env/config access is not proportional to the documented features and obscures what credentials might be used or exposed.

ℹ Persistence & Privilege

always:false and normal autonomous invocation are fine. The SKILL.md describes a persistent daemon and session state (cookies, localStorage, tokens), which could allow long-lived authenticated sessions. While the skill does not request platform-wide privileges, autonomous invocation combined with the described file/network capabilities increases blast radius if enabled — consider this when granting the skill use.

版本历史

v1.0.0

Initial release of the "browse" skill — a browser automation CLI for AI agents. - Enables navigation, form filling, clicking, screenshots, data extraction, web app testing, and automation via command line. - Exposes a wide set of browser control and query commands, organized by category (navigate, interact, observe, assert, etc.). - Employs a reference ("ref") system for targeting elements, with clear workflow guidance. - Session management supports named and isolated contexts for parallel or independent interactions. - Designed for use cases like UI testing, health checks, accessibility/a11y, performance, compliance, and more. - Comprehensive documentation included for all available commands and structured usage patterns.

元数据

Slug forjd-browse

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Browse 是什么？

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 106 次。

如何安装 Browse？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install forjd-browse」即可一键安装，无需额外配置。

Browse 是免费的吗？

是的，Browse 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Browse 支持哪些平台？

Browse 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Browse？

由 danjdewhurst（@danjdewhurst）开发并维护，当前版本 v1.0.0。

Browse