← Back to Skills Marketplace
jami-lin

chrome_skill

by Jami-Lin · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
31
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install chromeskill
Description
Browser automation via Chrome AI Action (CAA) bridge. Control Chrome programmatically — navigate, click, type, screenshot, extract content, and more. Uses Pu...
README (SKILL.md)

Chrome AI Action — Browser Automation Skill

AI Agent 浏览器自动化技能。通过 Chrome AI Action (CAA) 桥接服务,以 Puppeteer (CDP) 模式编程控制 Chrome 浏览器,支持导航、点击、输入、截图、内容提取、网络拦截、Cookie 管理、PDF 导出等 60+ 操作。


When to Use / 何时使用

场景 调用
User asks to browse a web page, search, fill forms, extract data Yes
User needs screenshots of a web page Yes
User wants to automate browser interactions Yes
User asks about writing code / debugging (no browser involved) No
场景 调用
用户需要在浏览器中打开网页、搜索、填写表单、提取数据
用户需要网页截图
用户希望自动化浏览器操作
用户问代码/调试相关(不涉及浏览器)

⚠️ CRITICAL: Chinese URL Encoding

IMPORTANT: When constructing URLs with Chinese characters for the navigate action, the agent MUST encode the query string values using encodeURIComponent. The bridge automatically encodes non-ASCII characters in the URL path, but query string values must be pre-encoded by the caller.

重要说明: 调用 navigate 时,URL 中如果包含中文字符,智能体必须先用 encodeURIComponent 对查询参数值进行编码。例如 wd=妻子的浪漫旅行 必须写成 wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C

Correct / 正确写法

{"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C"}}

Wrong / 错误写法

{"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=妻子的浪漫旅行"}}

How to encode in Node.js / 如何在 Node.js 中编码

const encoded = encodeURIComponent('妻子的浪漫旅行');
// Result: %E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C

Prerequisites / 前提条件

Requirement Check Auto-resolve
Chrome / Chromium installed Detected automatically No (user must install)
Chrome running with CDP Detected on startup Yes (auto-launched)
Node.js 18+ node --version No
要求 检查方式 自动处理
已安装 Chrome / Chromium 自动检测常用安装路径 否(用户需安装)
Chrome 以 CDP 模式运行 启动时检测 是(自动启动)
Node.js 18+ node --version

Startup Protocol / 启动协议

When loaded for the first time, the agent MUST run the startup script. The script runs the bridge as a background child process — the agent does NOT need to manage the process separately.

首次加载时,AI 智能体必须执行以下启动脚本。脚本会自动在后台启动桥接服务,智能体无需单独管理进程

node \x3Cskill_dir>/scripts/startup.js

What it does / 执行流程

  1. Check if bridge is already running: GET /health on port 9876 → skip if OK
  2. Ensure npm package installed: npm list -g chrome-ai-action → installs via npm install -g chrome-ai-action if missing
  3. Start the bridge: chrome-ai-action --port 9876, waits for health check
  4. Auto-launch Chrome: If Chrome not running with CDP, the bridge starts it automatically (cross-platform)

Environment Variables / 环境变量

Variable Default Description
CAA_BRIDGE_PORT 9876 Bridge HTTP server port
CAA_STARTUP_TIMEOUT 30000 Max wait for bridge ready (ms)
CHROME_PATH auto-detect Custom Chrome executable path
CHROME_USER_DATA_DIR platform-dependent Chrome profile directory

API Protocol / 通信协议

Endpoint: http://127.0.0.1:9876/

Endpoints / 接口地址

Method Path Description
GET /health Health check — returns bridge & CDP status
GET /schema Full action schema (64+ actions)
POST / Execute action(s)

Request Format / 请求格式

{"type": "action", "action": "\x3CACTION>", "params": {...}, "requestId": "optional-id"}

Batch Request / 批量请求

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://example.com"}},
  {"action": "getTitle"}
]}

Response Format / 响应格式

{"success": true, "data": {...}, "requestId": "req-1", "timestamp": 1712345678901}

Error Response / 错误响应

{"success": false, "error": {"code": "ACTION_ERROR", "message": "..."}, "requestId": "req-1", "timestamp": 1712345678901}

Available Actions (64+) / 可用操作 (64+)

Navigation / 导航

navigate, goBack, goForward, reload, getUrl, getTitle

Page Content / 页面内容

getText, getHtml, getLinks, getImages, getHeadings, getMetaTags, getFormFields, getFocusableElements

Element Interaction / 元素交互

click, type, pressKey, scroll, scrollIntoView, findElement, focus, hover, select

Data Extraction / 数据提取

getValue, getAttribute, getAttributeAll, getBoundingBox, getCookies, getPerformanceMetrics, getSelectedValue, getSelectOptions

JavaScript / JS 执行

evaluate, injectScript, injectCSS

Screenshot & Export / 截图与导出

screenshot (PNG/JPEG), getPdf (A4/Letter)

Tab Management / 标签页管理

listTabs, newTab, closeTab, switchTab, getCurrentTab

Waiting / 等待

waitForElement, waitForTimeout, waitForNavigation

Cookie Management / Cookie 管理

setCookie, deleteCookie

Network Interception / 网络拦截

blockUrls, unblockUrls, mockResponse, getNetworkRequests, clearNetworkRequests

Storage / 本地存储

getLocalStorage, setLocalStorage, removeLocalStorage, clearLocalStorage

File Operations / 文件操作

uploadFile, setInputFiles, downloadFile

Viewport / 视口

getViewport, setViewport

Console / 控制台日志

getConsoleLogs, clearConsoleLogs

Accessibility / 无障碍

getAccessibilityTree

Utility / 工具

ping, connect, disconnect, getBrowserInfo, highlight, dispatchEvent


Typical Workflow / 典型工作流

  1. Navigate: navigate → go to target URL (encode Chinese in query params)
  2. Wait: waitForElement → wait for key content
  3. Read: getText / getHtml / getLinks → understand page
  4. Interact: click / type / pressKey → perform actions
  5. Extract: getText / screenshot / evaluate → get results
  6. Confirm: screenshot → visually verify

Example: Search Baidu with Chinese / 百度搜索中文示例

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://www.baidu.com/s?wd=%E5%A6%BB%E5%AD%90%E7%9A%84%E6%B5%AA%E6%BC%AB%E6%97%85%E8%A1%8C"}},
  {"action": "waitForTimeout", "params": {"ms": 2000}},
  {"action": "getText"}
]}

Example: Full Login Flow / 登录流程示例

{"type": "batch", "actions": [
  {"action": "navigate", "params": {"url": "https://example.com/login"}},
  {"action": "waitForElement", "params": {"selector": "input[name=username]", "timeout": 10000}},
  {"action": "type", "params": {"selector": "input[name=username]", "value": "myuser"}},
  {"action": "type", "params": {"selector": "input[name=password]", "value": "mypassword"}},
  {"action": "click", "params": {"selector": "button[type=submit]"}},
  {"action": "waitForTimeout", "params": {"ms": 3000}},
  {"action": "getCurrentTab"}
]}

Error Handling / 错误处理

Error Code Meaning Resolution
CDP_NOT_CONNECTED Chrome not running with debug port Bridge auto-launches Chrome, retries every 3s
ACTION_ERROR Action execution failed Check params, use getFocusableElements to find elements first
INVALID_REQUEST Malformed request Check request format
PARSE_ERROR JSON parse failure Send valid JSON

Discovery Tips / 探测提示

When you don't know what elements are on a page:

  1. getFocusableElements → all interactive elements (with positions)
  2. getFormFields → all form inputs with metadata
  3. getLinks → all links on page
  4. getHeadings → understand page structure
  5. getText → all visible text

References / 参考资料

  • references/bridge-api.md — Complete API reference with all 64+ actions
  • references/setup-guide.md — Detailed setup and troubleshooting
  • scripts/startup.js — Startup automation script
Usage Guidance
Install only if you are comfortable giving the agent broad Chrome control. Prefer a dedicated Chrome profile with no sensitive logins, manually verify and pin the npm package, run without admin privileges, and stop the bridge after use.
Capability Analysis
Type: OpenClaw Skill Name: chromeskill Version: 1.0.0 The skill provides extensive browser automation capabilities via a bridge service, including cookie extraction, local storage access, arbitrary JavaScript execution (`evaluate`), and file upload/download. The `scripts/startup.js` file automatically installs a global npm package (`chrome-ai-action`) and launches a background process on port 9876. While these features are aligned with the stated goal of browser automation, the broad permissions and the automated global installation of external code represent a significant security risk and potential for data exfiltration or unauthorized system access if the agent is misused or the external package is compromised.
Capability Tags
crypto
Capability Assessment
Purpose & Capability
The browser automation purpose is coherent with navigation, clicking, screenshots, extraction, and CDP control, but the exposed capabilities are high-impact because they include cookies, storage, JavaScript injection, network interception, and file upload/download.
Instruction Scope
The artifacts describe broad raw actions such as evaluate/injectScript, batch requests, cookie/storage mutation, and file operations without clear per-domain scoping or user-confirmation requirements for sensitive actions.
Install Mechanism
There is no registry install spec, yet first use runs a startup script that globally installs and executes the external npm package `chrome-ai-action` without a pinned version or reviewed package contents.
Credentials
Registry requirements declare no binaries, env vars, or credentials, while the docs require Node.js/Chrome/npm and create localhost bridge/CDP access to a Chrome profile; this under-declares the real environment authority.
Persistence & Privilege
The skill starts a background bridge and can auto-launch Chrome with remote debugging, but the artifacts do not provide clear shutdown, isolation, or cleanup guidance.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install chromeskill
  3. After installation, invoke the skill by name or use /chromeskill
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
chrome-ai-action-skill 1.0.0 initial release: - Enables full browser automation via the Chrome AI Action (CAA) bridge using Puppeteer (CDP) mode. - Supports 60+ browser actions: navigation, clicking, typing, screenshots, data extraction, network interception, cookie and storage management, PDF export, and more. - Automatically installs required npm package and launches the bridge; Chrome is auto-started if not running. - Startup, API usage, error handling, and discovery tips clearly documented in English and Chinese. - Special guidance for correct URL encoding with Chinese characters in navigation actions.
Metadata
Slug chromeskill
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is chrome_skill?

Browser automation via Chrome AI Action (CAA) bridge. Control Chrome programmatically — navigate, click, type, screenshot, extract content, and more. Uses Pu... It is an AI Agent Skill for Claude Code / OpenClaw, with 31 downloads so far.

How do I install chrome_skill?

Run "/install chromeskill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is chrome_skill free?

Yes, chrome_skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does chrome_skill support?

chrome_skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created chrome_skill?

It is built and maintained by Jami-Lin (@jami-lin); the current version is v1.0.0.

💬 Comments