← 返回 Skills 市场
sarinali

GUI Automation

作者 sarinali · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
363
总下载
0
收藏
2
当前安装
1
版本数
在 OpenClaw 中安装
/install gui-automation
功能描述
Control the desktop via CUA computer server API running on port 8000
使用说明 (SKILL.md)

Desktop Control via CUA Server

This skill allows OpenClaw to control the desktop using the CUA computer server API.

⚠️ Security Notice

This skill requires installing and running a third-party server (cua-computer-sdk) that has full control over your desktop.

Before using this skill:

  • The server can simulate keyboard, mouse, and take screenshots
  • Only run on systems where you trust all users and processes
  • The server runs with your user privileges (no sudo/admin required)
  • By default, only accessible from localhost (safe for local use)

Prerequisites

  • Python 3.12+ installed on your system
  • CUA computer server running on port 8000 (see installation below)
  • Access to localhost:8000 only (network exposure not recommended)

Installation

Recommended: Temporary Session (Safest)

Run the server only when needed, in a terminal you can monitor:

# Install the Computer SDK (official CUA package)
pip install cua-computer-sdk

# Verify package (optional but recommended)
pip show cua-computer-sdk  # Check publisher and version

# Run temporarily (Ctrl+C to stop)
cua-server start --port 8000 --bind 127.0.0.1

# In another terminal, verify it's running locally only
curl http://localhost:8000/status
netstat -an | grep 8000  # Should show 127.0.0.1:8000

This is the safest approach - the server only runs when you explicitly start it and stops when you close the terminal.

Alternative: Install from Source

For transparency, you can review and run from source:

# Clone and review the code first
git clone https://github.com/trycua/cua-computer-server
cd cua-computer-server

# Review the code before running
ls -la
cat requirements.txt  # Check dependencies

# Install and run
pip install -r requirements.txt
python -m cua_server --port 8000 --bind 127.0.0.1

Running the Server

Option 1: Manual Start (Recommended)

# Start in foreground - you can see what it's doing
cua-server start --port 8000

# Stop with Ctrl+C when done

Option 2: Background Process (Temporary)

# Run in background for current session only
cua-server start --port 8000 &

# Note the process ID
echo "Server PID: $!"

# Stop when done
kill \x3CPID>

Note: This skill does NOT require persistent/system service installation. Running the server temporarily when needed is the recommended approach.

Scope & Limitations

This skill:

  • ✅ Controls YOUR desktop when the server is running
  • ✅ Runs with YOUR user privileges (no admin/sudo needed)
  • ✅ Only accessible from localhost by default

Security Best Practices

  1. Run Temporarily: Start the server only when needed, stop when done
  2. Localhost Only: Keep default binding to 127.0.0.1
  3. No Network Exposure: Avoid --bind 0.0.0.0 unless absolutely necessary
  4. Monitor Activity: Run in foreground to see what commands are executed
  5. Limited Scope: The server can only do what your user account can do

Quick Test

After starting the server, verify it works:

# Simple health check
curl http://localhost:8000/status
# Should return: {"status": "ok"}

# Take a screenshot (safe test)
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "screenshot"}' \
  -o screenshot.json

# If successful, you'll get a JSON response with base64 image data

Troubleshooting

Port Already in Use:

# Check what's using port 8000
lsof -i :8000              # macOS/Linux
netstat -ano | findstr :8000  # Windows

# Solution: Use a different port
cua-server start --port 8001

Permission Denied (Linux):

# You may need to add your user to the input group for keyboard/mouse control
sudo usermod -a -G input $USER
# Log out and back in for changes to take effect

Display Not Found (Linux):

# Check your display variable
echo $DISPLAY

# Set it explicitly
DISPLAY=:0 cua-server start --port 8000

Server Not Responding:

# Check if the process is running
ps aux | grep cua-server       # Linux/macOS
tasklist | findstr cua-server  # Windows

# Try running in foreground to see errors
cua-server start --port 8000 --debug

Available Commands

Take Screenshot

Capture the current screen:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "screenshot"}' \
  | jq -r '.result.base64' \
  | base64 -d > screenshot.png

Click at Coordinates

Click at specific x,y coordinates:

# Click at center of 1280x720 screen
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "left_click", "params": {"x": 640, "y": 360}}'

Right Click

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "right_click", "params": {"x": 640, "y": 360}}'

Double Click

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "double_click", "params": {"x": 640, "y": 360}}'

Type Text

Type text at the current cursor position:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "type_text", "params": {"text": "Hello, World!"}}'

Press Hotkey

Press a key combination:

# Ctrl+C
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "hotkey", "params": {"keys": ["ctrl", "c"]}}'

# Ctrl+Alt+T (open terminal)
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "hotkey", "params": {"keys": ["ctrl", "alt", "t"]}}'

Press Single Key

Press a single key:

# Press Enter
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "press_key", "params": {"key": "enter"}}'

# Press Escape
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "press_key", "params": {"key": "escape"}}'

Move Cursor

Move cursor to specific position:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "move_cursor", "params": {"x": 100, "y": 200}}'

Scroll

Scroll up or down:

# Scroll down 3 units
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "scroll_direction", "params": {"direction": "down", "amount": 3}}'

# Scroll up 5 units
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "scroll_direction", "params": {"direction": "up", "amount": 5}}'

Launch Application

Launch an application by name:

# Launch Firefox
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "launch", "params": {"app": "firefox"}}'

# Launch Terminal
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "launch", "params": {"app": "xfce4-terminal"}}'

Open File or URL

Open a file or URL with default application:

# Open URL
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "open", "params": {"path": "https://example.com"}}'

# Open file
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "open", "params": {"path": "/home/cua/document.txt"}}'

Get Window Information

Get current window ID:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "get_current_window_id"}'

Window Control

Maximize window:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "maximize_window", "params": {"window_id": "0x1234567"}}'

Minimize window:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "minimize_window", "params": {"window_id": "0x1234567"}}'

Demo Workflows

Browser Navigation Demo

Open Firefox and navigate to a website:

# Take initial screenshot
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "screenshot"}' -o initial.json

# Launch Firefox
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "launch", "params": {"app": "firefox"}}'
sleep 3

# Focus address bar (Ctrl+L)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "l"]}}'
sleep 1

# Type URL
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "https://example.com"}}'

# Press Enter
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'
sleep 5

# Take final screenshot
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "screenshot"}' -o final.json

Text Editor Demo

Open text editor and type content:

# Open terminal
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "alt", "t"]}}'
sleep 2

# Type command to open text editor
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "mousepad"}}'
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'
sleep 2

# Type some text
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "Hello from OpenClaw!\
This is automated desktop control."}}'

# Save file (Ctrl+S)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "s"]}}'
sleep 1

# Type filename
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "openclaw-demo.txt"}}'
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'

Form Filling Demo

Fill out a web form:

# Assuming browser is open with form visible

# Click on first input field (adjust coordinates)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "left_click", "params": {"x": 400, "y": 300}}'

# Type name
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "John Doe"}}'

# Tab to next field
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "tab"}}'

# Type email
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "[email protected]"}}'

# Tab to next field
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "tab"}}'

# Type message
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "This form was filled automatically by OpenClaw!"}}'

# Submit form (click submit button)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "left_click", "params": {"x": 400, "y": 500}}'

Helper Functions

Check Server Status

curl http://localhost:8000/status

List All Available Commands

curl http://localhost:8000/commands | jq

Get Screen Size

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "get_screen_size"}'

Get Cursor Position

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "get_cursor_position"}'

Environment Variables

  • CUA_SERVER_URL: Base URL for CUA server (default: http://localhost:8000)

Tips

  1. Wait Between Commands: Add sleep between commands to allow UI to update
  2. Check Coordinates: Screen is 1280x720, center is at (640, 360)
  3. Screenshot for Debugging: Take screenshots before and after actions to verify
  4. Use Variables: Store coordinates and text in variables for reusability

Example OpenClaw Usage

Once this skill is loaded, you can use it in OpenClaw conversations:

User: "Take a screenshot and open Firefox"
OpenClaw: *executes the screenshot and launch firefox commands*

User: "Type 'Hello World' in the current window"
OpenClaw: *executes the type_text command*

User: "Click at the center of the screen"
OpenClaw: *executes click command at 640,360*

Troubleshooting

  1. Connection Refused: Make sure CUA server is running on port 8000
  2. No Response: Check if you're in the container or have SSH tunnel set up
  3. Commands Not Working: Verify with curl http://localhost:8000/status
  4. Wrong Coordinates: Remember screen is 1280x720, adjust coordinates accordingly
安全使用建议
This skill is coherent for GUI automation but requires you to install and run a third-party server that can fully control your desktop. Before using it: (1) Verify the pip package and GitHub repo authorship and inspect the source code if possible; (2) run the server only when needed, in the foreground, bound to 127.0.0.1; (3) use a dedicated VM or isolated account on high-risk machines; (4) prefer reviewing requirements.txt and package metadata, and install inside a virtualenv; (5) avoid binding to 0.0.0.0 or exposing port 8000 to networks; (6) if you cannot verify the publisher or code, treat the package as untrusted and do not install on sensitive systems.
功能分析
Type: OpenClaw Skill Name: gui-automation Version: 1.0.1 The skill provides full desktop control capabilities (mouse, keyboard, screenshots, and application launching) by instructing the user to install and run a local API server (cua-computer-sdk). While the documentation in SKILL.md is transparent and includes security warnings regarding localhost binding and temporary sessions, the capability itself is high-risk as it grants the AI agent RCE-equivalent access to the host's graphical environment. No evidence of intentional malice or exfiltration was found in the provided files, but the reliance on an external dependency for such broad permissions warrants a suspicious classification.
能力评估
Purpose & Capability
The name/description (desktop/GUI automation) align with the SKILL.md: it instructs the user to run a local CUA server and shows curl commands to send mouse/keyboard/screenshot commands. Nothing requested by the skill (no credentials, no unrelated files) is inconsistent with desktop control.
Instruction Scope
The runtime instructions are narrowly scoped to installing and running a local server and calling its API (screenshot, clicks, key presses). They do not ask the agent to read unrelated files or exfiltrate data. However, the instructions explicitly enable full desktop control and include examples for executing arbitrary commands via the server API, which is powerful and potentially risky if misused.
Install Mechanism
This is an instruction-only skill (no install spec in registry). The SKILL.md recommends pip installing 'cua-computer-sdk' or cloning a GitHub repo. Installing third-party packages via pip or running cloned source is common for this functionality but carries supply-chain risk — the registry metadata contains no homepage and the package/repo are not verified here.
Credentials
The skill declares no environment variables, credentials, or config paths. The privileges requested (run as your user) are proportionate for a desktop-control tool; no unrelated secrets are requested.
Persistence & Privilege
always is false and the skill does not instruct persistent system-wide installation; it recommends running the server temporarily and binding to localhost. Autonomous model invocation is allowed by default (platform behavior) but the skill itself does not request force-inclusion or system-level changes.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gui-automation
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gui-automation 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
Version 1.0.1 - Improved security guidance and best practices prominently in the documentation. - Clarified that running the Cua server as a temporary foreground process is recommended. - Reduced instructions about persistent/system/background installation; now emphasizes temporary/manual use. - Simplified installation and troubleshooting steps for easier and safer onboarding. - Expanded and reorganized documentation warnings about the server’s capabilities and risks.
元数据
Slug gui-automation
版本 1.0.1
许可证 MIT-0
累计安装 3
当前安装数 2
历史版本数 1
常见问题

GUI Automation 是什么?

Control the desktop via CUA computer server API running on port 8000. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 363 次。

如何安装 GUI Automation?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gui-automation」即可一键安装,无需额外配置。

GUI Automation 是免费的吗?

是的,GUI Automation 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

GUI Automation 支持哪些平台?

GUI Automation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 GUI Automation?

由 sarinali(@sarinali)开发并维护,当前版本 v1.0.1。

💬 留言讨论