功能描述

Lightweight CDP browser control for AI agents. Token-efficient alternative to the built-in browser tool — 3-10x fewer tokens per interaction. Use when browsi...

使用说明 (SKILL.md)

browser-js

Name: Browser Js
Author: shaihazher

Lightweight CLI that talks to Chrome via CDP (Chrome DevTools Protocol). Returns minimal, indexed output that agents can act on immediately — no accessibility tree parsing, no ref hunting.

Setup

# Install dependency (one-time, in the skill scripts/ dir)
cd scripts && npm install

# Ensure browser is running with CDP enabled.
# With OpenClaw:
#   browser start profile=openclaw
# Or manually:
#   google-chrome --remote-debugging-port=18800 --user-data-dir=~/.browser-data

The tool connects to http://127.0.0.1:18800 by default. Override with CDP_URL env var.

Alias setup (optional)

mkdir -p ~/.local/bin
cat > ~/.local/bin/bjs \x3C\x3C 'WRAPPER'
#!/bin/bash
exec node /path/to/scripts/browser.js "$@"
WRAPPER
chmod +x ~/.local/bin/bjs

Commands

bjs tabs                    List open tabs
bjs open \x3Curl>              Navigate to URL
bjs tab \x3Cindex>             Switch to tab
bjs newtab [url]            Open new tab
bjs close [index]           Close tab
bjs elements [selector]     List interactive elements (indexed)
bjs click \x3Cindex>           Click element by index
bjs type \x3Cindex> \x3Ctext>     Type into element
bjs upload \x3Cpath> [selector] Upload file to input (bypasses OS dialog)
bjs text [selector]         Extract visible page text
bjs html \x3Cselector>         Get element HTML
bjs eval \x3Cjs>               Run JavaScript in page
bjs screenshot [path]       Save screenshot
bjs scroll \x3Cup|down|top|bottom> [px]
bjs url                     Current URL
bjs back / forward / refresh
bjs wait \x3Cms>

Coordinate commands (cross-origin iframes, captchas, overlays):
bjs click-xy \x3Cx> \x3Cy>       Click at page coordinates via CDP Input
bjs click-xy \x3Cx> \x3Cy> --double   Double-click at coordinates
bjs click-xy \x3Cx> \x3Cy> --right    Right-click at coordinates
bjs hover-xy \x3Cx> \x3Cy>       Hover at page coordinates
bjs drag-xy \x3Cx1> \x3Cy1> \x3Cx2> \x3Cy2>   Drag between coordinates
bjs iframe-rect \x3Cselector> Get iframe bounding box (for click-xy targeting)

How it works

elements scans the page for all interactive elements (links, buttons, inputs, selects, etc.) — including those inside shadow DOM (web components). This means sites like Reddit, GitHub, and other modern SPAs that use shadow DOM are fully supported. The scan recursively pierces all shadow roots.

Returns a compact numbered list:

[0] (link) Hacker News → https://news.ycombinator.com/news
[1] (link) new → https://news.ycombinator.com/newest
[2] (input:text) q
[3] (button) Submit

Then click 3 or type 2 search query — immediately actionable, no interpretation needed.

Auto-indexing: click and type auto-index elements if not already indexed. You can skip calling elements first and go straight to click/type after open. Call elements explicitly when you need to see what's on the page.

After navigation or AJAX changes: Elements get re-indexed automatically on next click/type if stamps are stale. For manual re-index, call elements again.

Real mouse events: click uses CDP Input.dispatchMouseEvent (mousePressed + mouseReleased) instead of JS .click(). This triggers React/Vue/Angular synthetic event handlers that ignore plain .click() calls. Works reliably on SPAs like Instagram, GitHub, LinkedIn.

File uploads

upload uses CDP's DOM.setFileInputFiles to inject files directly into hidden \x3Cinput type="file"> elements — no OS file picker dialog. Works with Instagram, Twitter, any site with file uploads.

bjs upload ~/photos/image.jpg                    # auto-finds input[type=file]
bjs upload ~/docs/resume.pdf "input.file-drop"   # specific selector

Token efficiency

Approach	Tokens per interaction	Notes
bjs	~50-200	Indexed list, 1-line responses
browser tool (snapshot)	~2,000-5,000	Full accessibility tree
browser tool + thinking	~3,000-8,000	Plus reasoning to find refs

Over a 10-step flow: ~1,500 tokens (bjs) vs ~30,000-80,000 (browser tool).

Typical flow

bjs open https://example.com       # Navigate
bjs elements                        # See what's clickable
bjs click 5                         # Click element [5]
bjs type 12 "hello world"          # Type into element [12]
bjs text                            # Read page content
bjs screenshot /tmp/result.png      # Verify visually

Shadow DOM support

bjs automatically pierces shadow DOM boundaries. Sites built with web components (Reddit, GitHub, etc.) work out of the box — elements, click, type, and text all recurse into shadow roots. No special flags needed.

Coordinate commands (iframes, captchas, overlays)

When you can't use click by index — e.g. the target is inside a cross-origin iframe (captcha checkbox, payment form, OAuth widget) — use coordinate-based commands that dispatch real CDP Input events at the OS level. These bypass all DOM boundaries.

Workflow for clicking inside an iframe:

bjs iframe-rect 'iframe[title*="hCaptcha"]'    # Get bounding box
# Output: x=95 y=440 w=302 h=76 center=(246, 478)

bjs click-xy 125 458                            # Click checkbox position

iframe-rect returns the iframe's position on the page. Add offsets to target specific elements inside it (e.g. a checkbox is typically near the left side).

Other uses:

hover-xy — trigger hover menus, tooltips that need mouse position
drag-xy — slider controls, drag-and-drop, canvas interactions
click-xy --double — double-click to select text, expand items
click-xy --right — context menus

When to use coordinate commands vs click:

click \x3Cindex> — always preferred when the element shows up in elements
click-xy — only when the target is inside a cross-origin iframe or otherwise unreachable by DOM indexing

Tips

elements with a CSS selector narrows scope: bjs elements ".modal"
eval runs arbitrary JS and returns the result — use for custom extraction
text caps at 8KB — enough for most pages, won't blow up context
html \x3Cselector> caps at 10KB — for inspecting specific elements
Pipe through grep to filter: bjs elements | grep -i "submit\|login"

安全使用建议

This skill is coherent with its stated purpose (local CDP-based browser automation), but review these points before installing: - You will need Node/npm to run it; SKILL metadata doesn't list Node as a required binary. Run 'npm install' in the scripts/ directory as instructed. - The tool connects to a Chrome/Chromium instance via the CDP endpoint (default http://127.0.0.1:18800 or set CDP_URL). Ensure the debug port is bound only to localhost and not exposed to the network. - It can use your browser profile (signed-in sessions) and can run arbitrary JS in pages, inject files into file inputs, and dispatch real mouse events — meaning it can read private page content and perform actions as you. Only use it with a browser/profile you trust and not on shared or public machines. - The SKILL.md references CDP_URL but the registry metadata does not declare any env vars; be aware of this mismatch. When creating a local alias, double-check the path used so you don't accidentally point to a different script. - If you plan to allow autonomous agent invocation, consider limiting autonomy or reviewing the code thoroughly; the code uses only local CDP and the npm 'ws' package, but arbitrary page.eval still allows data access. If you want higher assurance, inspect the full scripts/browser.js file locally and run it in a controlled environment first (with a disposable browser profile).

功能分析

Type: OpenClaw Skill Name: browser-js Version: 1.5.0 This skill bundle is classified as suspicious due to its powerful capabilities that, if misused via prompt injection against the AI agent, could lead to significant security risks. The `scripts/browser.js` file implements commands like `eval` which allows arbitrary JavaScript execution within the browser context, and `upload` which can read any local file (that the Node.js process has access to) and upload it to a web form. While these are intended features for a browser automation tool, they present a high-risk attack surface for data exfiltration or unauthorized actions if an AI agent is compromised. There is no evidence of intentional malicious code or prompt injection within the provided files themselves, but the inherent power of these commands warrants a 'suspicious' classification.

能力评估

ℹ Purpose & Capability

The skill's name/description match the included code and SKILL.md: it is a CDP-based browser automation CLI. Small mismatches: metadata lists no required binaries/env vars but the tool expects Node (to run node browser.js and npm install) and optionally CDP_URL — these are mentioned in SKILL.md but not declared in the registry 'requires' fields.

ℹ Instruction Scope

Runtime instructions stay within the browsing/automation scope (npm install, run node script, connect to local CDP). However the tool exposes powerful actions: page.eval (run arbitrary JS in page), DOM.setFileInputFiles (inject files into file inputs), and coordinate-based Input events (can interact with cross-origin iframes/captchas). These are expected for a browser automation tool but are sensitive because they can act using the browser's signed-in sessions and read/act on page content.

✓ Install Mechanism

There is no packaged install spec — the SKILL.md requires running 'npm install' in the scripts/ dir (uses the well-known npm registry dependency 'ws'). No remote download URLs or archives are present in the install flow. This is a moderate-risk but expected approach for a JS CLI script.

ℹ Credentials

The skill declares no required credentials or config paths, which is consistent with a local-only tool. SKILL.md allows overriding the CDP endpoint via CDP_URL (not declared in metadata). Important capability: connecting to a browser profile with --user-data-dir (signed-in sessions) means the tool can act as the signed-in user and access cookies/session data — this is expected for a browser controller but is sensitive and worth explicit consideration.

✓ Persistence & Privilege

Skill is not 'always: true' and does not request elevated or persistent platform privileges. It suggests an optional local alias but does not modify other skills or agent-wide configuration.

版本历史

v1.5.0

- Adds coordinate-based commands (`click-xy`, `hover-xy`, `drag-xy`, `iframe-rect`) for interacting with elements inside cross-origin iframes, captchas, overlays, and other non-indexable UI. - Updates command list and documentation to explain when and how to use coordinate-based input vs. standard indexed element commands. - Clarifies workflow and usage details for cross-origin iframe interaction, including coordinate calculation and tips.

v1.4.0

Shadow DOM support: elements/click/type/text now pierce shadow roots. Reddit, GitHub, and other web-component-heavy sites fully supported.

v1.3.0

fix: robust type command — click-to-focus, Input.insertText, reject non-typeable elements

v1.2.0

v1.2.0: CDP mouse events for click (fixes React/Vue/Angular SPAs). Modal-aware element indexing — elements inside dialogs get priority and won't be de-duped against page elements. Fixes Instagram/GitHub modal button issues.

v1.1.0

Show disabled elements with :disabled tag instead of hiding them. Enables workarounds for delayed-enable buttons (e.g. GitHub OAuth).

v1.0.0

Initial release: Lightweight CDP browser control for AI agents. 3-10x fewer tokens than built-in browser tools. Auto-indexing, file uploads, signed-in session reuse.

元数据

Slug browser-js

版本 1.5.0

许可证 —

累计安装 7

当前安装数 7

历史版本数 6

常见问题

Browser Js 是什么？

Lightweight CDP browser control for AI agents. Token-efficient alternative to the built-in browser tool — 3-10x fewer tokens per interaction. Use when browsi... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1124 次。

如何安装 Browser Js？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install browser-js」即可一键安装，无需额外配置。

Browser Js 是免费的吗？

是的，Browser Js 完全免费（开源免费），可自由下载、安装和使用。

Browser Js 支持哪些平台？

Browser Js 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Browser Js？

由 shaihazher（@shaihazher）开发并维护，当前版本 v1.5.0。

Browser Js