Description

Browser automation via Chrome AI Action (CAA) bridge. Control Chrome programmatically — navigate, click, type, screenshot, extract content, and more. Uses Pu...

README (SKILL.md)

Chrome AI Action — Browser Automation Skill

Name: chrome_skill
Author: jami-lin

AI Agent 浏览器自动化技能。通过 Chrome AI Action (CAA) 桥接服务，以 Puppeteer (CDP) 模式编程控制 Chrome 浏览器，支持导航、点击、输入、截图、内容提取、网络拦截、Cookie 管理、PDF 导出等 60+ 操作。

When to Use / 何时使用

场景	调用
User asks to browse a web page, search, fill forms, extract data	Yes
User needs screenshots of a web page	Yes
User wants to automate browser interactions	Yes
User asks about writing code / debugging (no browser involved)	No

场景	调用
用户需要在浏览器中打开网页、搜索、填写表单、提取数据	是
用户需要网页截图	是
用户希望自动化浏览器操作	是
用户问代码/调试相关（不涉及浏览器）	否

⚠️ CRITICAL: Chinese URL Encoding

IMPORTANT: When constructing URLs with Chinese characters for the navigate action, the agent MUST encode the query string values using encodeURIComponent. The bridge automatically encodes non-ASCII characters in the URL path, but query string values must be pre-encoded by the caller.

重要说明: 调用 navigate 时，URL 中如果包含中文字符，智能体必须先用 encodeURIComponent 对查询参数值进行编码。例如 wd=妻子的浪漫旅行 必须写成 wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C。

Correct / 正确写法

{"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C"}}

Wrong / 错误写法

{"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=妻子的浪漫旅行"}}

How to encode in Node.js / 如何在 Node.js 中编码

const encoded = encodeURIComponent('妻子的浪漫旅行');
// Result: %E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C

Prerequisites / 前提条件

Requirement	Check	Auto-resolve
Chrome / Chromium installed	Detected automatically	No (user must install)
Chrome running with CDP	Detected on startup	Yes (auto-launched)
Node.js 18+	`node --version`	No

要求	检查方式	自动处理
已安装 Chrome / Chromium	自动检测常用安装路径	否（用户需安装）
Chrome 以 CDP 模式运行	启动时检测	是（自动启动）
Node.js 18+	`node --version`	否

Startup Protocol / 启动协议

When loaded for the first time, the agent MUST run the startup script. The script runs the bridge as a background child process — the agent does NOT need to manage the process separately.

首次加载时，AI 智能体必须执行以下启动脚本。脚本会自动在后台启动桥接服务，智能体无需单独管理进程。

node \x3Cskill_dir>/scripts/startup.js

What it does / 执行流程

Check if bridge is already running: GET /health on port 9876 → skip if OK
Ensure npm package installed: npm list -g chrome-ai-action → installs via npm install -g chrome-ai-action if missing
Start the bridge: chrome-ai-action --port 9876, waits for health check
Auto-launch Chrome: If Chrome not running with CDP, the bridge starts it automatically (cross-platform)

Environment Variables / 环境变量

Variable	Default	Description
`CAA_BRIDGE_PORT`	`9876`	Bridge HTTP server port
`CAA_STARTUP_TIMEOUT`	`30000`	Max wait for bridge ready (ms)
`CHROME_PATH`	auto-detect	Custom Chrome executable path
`CHROME_USER_DATA_DIR`	platform-dependent	Chrome profile directory

API Protocol / 通信协议

Endpoint: http://127.0.0.1:9876/

Endpoints / 接口地址

Method	Path	Description
`GET`	`/health`	Health check — returns bridge & CDP status
`GET`	`/schema`	Full action schema (64+ actions)
`POST`	`/`	Execute action(s)

Request Format / 请求格式

{"type": "action", "action": "\x3CACTION>", "params": {...}, "requestId": "optional-id"}

Batch Request / 批量请求

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://example.com"}},
  {"action": "getTitle"}
]}

Response Format / 响应格式

{"success": true, "data": {...}, "requestId": "req-1", "timestamp": 1712345678901}

Error Response / 错误响应

{"success": false, "error": {"code": "ACTION_ERROR", "message": "..."}, "requestId": "req-1", "timestamp": 1712345678901}

Available Actions (64+) / 可用操作 (64+)

Navigation / 导航

navigate, goBack, goForward, reload, getUrl, getTitle

Page Content / 页面内容

getText, getHtml, getLinks, getImages, getHeadings, getMetaTags, getFormFields, getFocusableElements

Element Interaction / 元素交互

click, type, pressKey, scroll, scrollIntoView, findElement, focus, hover, select

Data Extraction / 数据提取

getValue, getAttribute, getAttributeAll, getBoundingBox, getCookies, getPerformanceMetrics, getSelectedValue, getSelectOptions

JavaScript / JS 执行

evaluate, injectScript, injectCSS

Screenshot & Export / 截图与导出

screenshot (PNG/JPEG), getPdf (A4/Letter)

Tab Management / 标签页管理

listTabs, newTab, closeTab, switchTab, getCurrentTab

Waiting / 等待

waitForElement, waitForTimeout, waitForNavigation

Cookie Management / Cookie 管理

setCookie, deleteCookie

Network Interception / 网络拦截

blockUrls, unblockUrls, mockResponse, getNetworkRequests, clearNetworkRequests

Storage / 本地存储

getLocalStorage, setLocalStorage, removeLocalStorage, clearLocalStorage

File Operations / 文件操作

uploadFile, setInputFiles, downloadFile

Viewport / 视口

getViewport, setViewport

Console / 控制台日志

getConsoleLogs, clearConsoleLogs

Accessibility / 无障碍

getAccessibilityTree

Utility / 工具

ping, connect, disconnect, getBrowserInfo, highlight, dispatchEvent

Typical Workflow / 典型工作流

Navigate: navigate → go to target URL (encode Chinese in query params)
Wait: waitForElement → wait for key content
Read: getText / getHtml / getLinks → understand page
Interact: click / type / pressKey → perform actions
Extract: getText / screenshot / evaluate → get results
Confirm: screenshot → visually verify

Example: Search Baidu with Chinese / 百度搜索中文示例

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C"}},
  {"action": "waitForTimeout", "params": {"ms": 2000}},
  {"action": "getText"}
]}

Example: Full Login Flow / 登录流程示例

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://example.com/login"}},
  {"action": "waitForElement", "params": {"selector": "input[name=username]", "timeout": 10000}},
  {"action": "type", "params": {"selector": "input[name=username]", "value": "myuser"}},
  {"action": "type", "params": {"selector": "input[name=password]", "value": "mypassword"}},
  {"action": "click", "params": {"selector": "button[type=submit]"}},
  {"action": "waitForTimeout", "params": {"ms": 3000}},
  {"action": "getCurrentTab"}
]}

Error Handling / 错误处理

Error Code	Meaning	Resolution
`CDP_NOT_CONNECTED`	Chrome not running with debug port	Bridge auto-launches Chrome, retries every 3s
`ACTION_ERROR`	Action execution failed	Check params, use `getFocusableElements` to find elements first
`INVALID_REQUEST`	Malformed request	Check request format
`PARSE_ERROR`	JSON parse failure	Send valid JSON

Discovery Tips / 探测提示

When you don't know what elements are on a page:

getFocusableElements → all interactive elements (with positions)
getFormFields → all form inputs with metadata
getLinks → all links on page
getHeadings → understand page structure
getText → all visible text

References / 参考资料

references/bridge-api.md — Complete API reference with all 64+ actions
references/setup-guide.md — Detailed setup and troubleshooting
scripts/startup.js — Startup automation script

Usage Guidance

Install only if you are comfortable giving the agent broad Chrome control. Prefer a dedicated Chrome profile with no sensitive logins, manually verify and pin the npm package, run without admin privileges, and stop the bridge after use.

Capability Analysis

Type: OpenClaw Skill Name: chromeskill Version: 1.0.0 The skill provides extensive browser automation capabilities via a bridge service, including cookie extraction, local storage access, arbitrary JavaScript execution (`evaluate`), and file upload/download. The `scripts/startup.js` file automatically installs a global npm package (`chrome-ai-action`) and launches a background process on port 9876. While these features are aligned with the stated goal of browser automation, the broad permissions and the automated global installation of external code represent a significant security risk and potential for data exfiltration or unauthorized system access if the agent is misused or the external package is compromised.

Capability Tags

crypto

Capability Assessment

ℹ Purpose & Capability

The browser automation purpose is coherent with navigation, clicking, screenshots, extraction, and CDP control, but the exposed capabilities are high-impact because they include cookies, storage, JavaScript injection, network interception, and file upload/download.

⚠ Instruction Scope

The artifacts describe broad raw actions such as evaluate/injectScript, batch requests, cookie/storage mutation, and file operations without clear per-domain scoping or user-confirmation requirements for sensitive actions.

⚠ Install Mechanism

There is no registry install spec, yet first use runs a startup script that globally installs and executes the external npm package `chrome-ai-action` without a pinned version or reviewed package contents.

⚠ Credentials

Registry requirements declare no binaries, env vars, or credentials, while the docs require Node.js/Chrome/npm and create localhost bridge/CDP access to a Chrome profile; this under-declares the real environment authority.

⚠ Persistence & Privilege

The skill starts a background bridge and can auto-launch Chrome with remote debugging, but the artifacts do not provide clear shutdown, isolation, or cleanup guidance.

Version History

v1.0.0

chrome-ai-action-skill 1.0.0 initial release: - Enables full browser automation via the Chrome AI Action (CAA) bridge using Puppeteer (CDP) mode. - Supports 60+ browser actions: navigation, clicking, typing, screenshots, data extraction, network interception, cookie and storage management, PDF export, and more. - Automatically installs required npm package and launches the bridge; Chrome is auto-started if not running. - Startup, API usage, error handling, and discovery tips clearly documented in English and Chinese. - Special guidance for correct URL encoding with Chinese characters in navigation actions.

Metadata

Slug chromeskill

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is chrome_skill?

Browser automation via Chrome AI Action (CAA) bridge. Control Chrome programmatically — navigate, click, type, screenshot, extract content, and more. Uses Pu... It is an AI Agent Skill for Claude Code / OpenClaw, with 31 downloads so far.

How do I install chrome_skill?

Run "/install chromeskill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is chrome_skill free?

Yes, chrome_skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does chrome_skill support?

chrome_skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created chrome_skill?

It is built and maintained by Jami-Lin (@jami-lin); the current version is v1.0.0.

More Skills

chrome_skill