← Back to Skills Marketplace
160
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install ai-browser
Description
通过 WebSocket 控制真实浏览器,实现导航、点击、输入、截图、DOM 获取等完整自动化操作。特点:真正的浏览器内核 (Chromium)、WebSocket 实时控制、支持无头/有头模式、自动重连机制。
README (SKILL.md)
AI Browser Skill 🌐
通过 WebSocket 控制真实浏览器,实现导航、点击、输入、截图、DOM 获取等自动化操作。
特点
- ✅ 真正的浏览器内核 (Chromium)
- ✅ WebSocket 实时控制
- ✅ 支持无头/有头模式
- ✅ 简单的标签页管理
- ✅ 自动重连机制
启动方法
# 1. 安装依赖
npm install
# 2. 启动服务
npm start
# 服务将运行在 ws://localhost:18790
WebSocket 协议
连接
连接到 ws://localhost:18790
消息格式
发送 JSON:
{
"id": "请求 ID (可选)",
"action": "动作名称",
"params": { ... }
}
支持的动作
| 动作 | 参数 | 说明 |
|---|---|---|
navigate |
{ url: "https://..." } |
导航到指定 URL |
snapshot |
{} |
获取当前页面简化 DOM 结构 |
screenshot |
{ fullPage: false } |
截图 (返回 base64) |
click |
{ selector: "button" } |
点击元素 |
type |
{ selector: "input", text: "hello", delay: 50 } |
输入文本 |
evaluate |
{ script: "document.title" } |
执行 JS 脚本 |
status |
{} |
获取浏览器状态 |
响应格式
{
"id": "请求 ID",
"success": true,
"result": { ... }
}
使用示例 (Python)
import websocket
import json
ws = websocket.create_connection("ws://localhost:18790")
# 导航
ws.send(json.dumps({"action": "navigate", "params": {"url": "https://fanqie.baidu.com"}}))
print(ws.recv())
# 截图
ws.send(json.dumps({"action": "screenshot", "params": {}}))
resp = json.loads(ws.recv())
with open("screen.png", "wb") as f:
f.write(base64.b64decode(resp["result"]["image"]))
ws.close()
注意事项
- 首次启动会自动下载 Chromium (约 100MB)
- 默认端口 18790,可通过
AI_BROWSER_PORT环境变量修改 - 无头模式设为
false,可以看到浏览器界面(方便调试)
使用场景
- 网页自动化测试
- 数据抓取
- 截图采集
- 表单自动填写
- 网站监控
Usage Guidance
This skill implements a full Puppeteer-controlled browser over a WebSocket API — powerful but risky if run without protections. Before installing/running: (1) review quick-control.js: it connects to a local Chrome and targets a specific site (fanqie.baidu.com) — remove or audit this if you don't trust that usage. (2) Run the service in an isolated environment (VM, container) to avoid exposing local browser cookies or credentials. (3) Restrict access: bind the WebSocket server to 127.0.0.1 explicitly or use firewall rules, and ensure port 18790 (and 9222 remote-debugging) are not reachable from untrusted networks. (4) Add authentication/authorization to the WebSocket API or a reverse-proxy that enforces it. (5) If you need only connect to an existing Chrome, prefer puppeteer-core usage but be aware connecting to remote debugging port can expose the whole browser. (6) Verify the package origin and consider pinning dependency versions; expect Puppeteer to download a Chromium binary during install. If you cannot take these precautions, treat this skill as unsafe on hosts with sensitive sessions or open network interfaces.
Capability Analysis
Type: OpenClaw Skill
Name: ai-browser
Version: 1.0.0
The skill provides a WebSocket-based remote control interface for a Chromium browser using Puppeteer, which includes high-risk capabilities such as arbitrary JavaScript execution (`evaluate` action) and automated interaction (typing, clicking) without any authentication or authorization. The server (`server.js`) also disables the Chromium sandbox (`--no-sandbox`) and opens a remote debugging port (9222), creating a significant attack surface. While these features align with the stated goal of browser automation, the lack of security controls and the inclusion of a script (`quick-control.js`) specifically targeting a Chinese writing platform (fanqie.baidu.com) warrant a suspicious classification.
Capability Assessment
Purpose & Capability
Overall the code and instructions match the stated purpose: a Puppeteer-based WebSocket service to control Chromium (navigation, click, screenshot, evaluate). However quick-control.js specifically connects to a local Chrome and targets a specific site (https://fanqie.baidu.com/writer) and contains logic about login/publishing — that is not aligned with a generic 'AI Browser' skill and looks like a leftover utility for a particular workflow.
Instruction Scope
SKILL.md instructs npm install and npm start and to connect to ws://localhost:18790, which matches server behavior in basic form. But the runtime code accepts arbitrary JSON actions including 'evaluate' (executes arbitrary JS in page context) and returns DOM/screenshot data — which is expected for this feature set but is a powerful capability. The server exposes a WebSocket server without any authentication or origin checks; SKILL.md emphasizes localhost but the Node WebSocket server binds the port with no explicit host and will listen on all interfaces by default, so it may be reachable beyond loopback. quick-control.js also connects to a remote-debugging port (9222) and navigates a specific site — instructions do not call out these risks or recommend isolation.
Install Mechanism
No install spec in registry; SKILL.md instructs npm install and package.json depends on puppeteer and ws. Installing puppeteer will download a Chromium binary (~100MB) at install-time/runtime — this is expected but significant. The install path uses npm (a well-known registry) so download-origin risk is moderate and expected for this skill.
Credentials
The skill requests no secret env vars (only optionally AI_BROWSER_PORT). Still, it launches a browser and exposes a debugging port (9222) and a WebSocket control port; these allow access to pages and any authenticated sessions loaded in the browser, which is a privilege that can expose sensitive local data (cookies, logged-in sites). quick-control.js specifically targets a third-party site (fanqie.baidu.com), which implies this bundle was prepared for a specific account/workflow and could interact with user sessions on that site — that raises proportionality questions given the generic description.
Persistence & Privilege
The skill is not always-enabled and does not request special platform privileges. It does not modify other skills or system configs. Autonomous invocation is allowed by default (normal for skills) but is not combined here with an 'always' flag or other elevation.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install ai-browser - After installation, invoke the skill by name or use
/ai-browser - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
AI Browser v1.0.0 – Initial release
- Provides real browser automation via WebSocket, supporting navigation, clicks, typing, screenshots, and DOM retrieval.
- Built on a real Chromium engine.
- Supports both headless and headed modes.
- Features live control, simple tab management, and automatic reconnection.
- Easy setup with a default port and optional configuration.
Metadata
Frequently Asked Questions
What is AI Browser?
通过 WebSocket 控制真实浏览器,实现导航、点击、输入、截图、DOM 获取等完整自动化操作。特点:真正的浏览器内核 (Chromium)、WebSocket 实时控制、支持无头/有头模式、自动重连机制。 It is an AI Agent Skill for Claude Code / OpenClaw, with 160 downloads so far.
How do I install AI Browser?
Run "/install ai-browser" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is AI Browser free?
Yes, AI Browser is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does AI Browser support?
AI Browser is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux, darwin, win32).
Who created AI Browser?
It is built and maintained by linbo405 (@linbo405); the current version is v1.0.0.
More Skills