← 返回 Skills 市场
sarinali

Desktop Automation

作者 sarinali · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
671
总下载
0
收藏
10
当前安装
1
版本数
在 OpenClaw 中安装
/install desktop-automation
功能描述
Control the desktop via CUA computer server API running on port 8000
使用说明 (SKILL.md)

Desktop Control via CUA Server

This skill allows OpenClaw to control the desktop using the CUA computer server API.

Prerequisites

  • CUA computer server running on port 8000
  • Access to localhost:8000 (or configured CUA_SERVER_URL)

Installation

To control your host desktop with OpenClaw, you need to install and run the CUA computer server on your machine.

Quick Start (Python SDK)

The easiest way to install the CUA computer server on your host:

# Install the Computer SDK
pip install cua-computer-sdk

# Start the server (it will control your current desktop)
cua-server start --port 8000

# Or if you need to specify the display (Linux/Unix)
DISPLAY=:0 cua-server start --port 8000

# Verify it's running
curl http://localhost:8000/status

Alternative: Install from Source

If you prefer to install from source:

# Clone the repository
git clone https://github.com/trycua/cua-computer-server
cd cua-computer-server

# Install dependencies
pip install -r requirements.txt

# Run the server
python -m cua_server --port 8000

Running as a Background Service

For always-on desktop control, set up as a system service:

macOS (launchd):

# Create a plist file
cat > ~/Library/LaunchAgents/com.cua.server.plist \x3C\x3CEOF
\x3C?xml version="1.0" encoding="UTF-8"?>
\x3C!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
\x3Cplist version="1.0">
\x3Cdict>
    \x3Ckey>Label\x3C/key>
    \x3Cstring>com.cua.server\x3C/string>
    \x3Ckey>ProgramArguments\x3C/key>
    \x3Carray>
        \x3Cstring>/usr/local/bin/cua-server\x3C/string>
        \x3Cstring>start\x3C/string>
        \x3Cstring>--port\x3C/string>
        \x3Cstring>8000\x3C/string>
    \x3C/array>
    \x3Ckey>RunAtLoad\x3C/key>
    \x3Ctrue/>
    \x3Ckey>KeepAlive\x3C/key>
    \x3Ctrue/>
\x3C/dict>
\x3C/plist>
EOF

# Load the service
launchctl load ~/Library/LaunchAgents/com.cua.server.plist

# Start the service
launchctl start com.cua.server

Linux (systemd):

# Create service file
sudo tee /etc/systemd/system/cua-server.service > /dev/null \x3C\x3CEOF
[Unit]
Description=CUA Computer Server
After=network.target

[Service]
Type=simple
User=$USER
Environment="DISPLAY=:0"
Environment="XAUTHORITY=/home/$USER/.Xauthority"
ExecStart=/usr/local/bin/cua-server start --port 8000
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable cua-server
sudo systemctl start cua-server

# Check status
sudo systemctl status cua-server

Windows (Task Scheduler):

# Create a scheduled task to run at startup
$action = New-ScheduledTaskAction -Execute "cua-server.exe" -Argument "start --port 8000"
$trigger = New-ScheduledTaskTrigger -AtStartup
$principal = New-ScheduledTaskPrincipal -UserId "$env:USERNAME" -LogonType Interactive
$settings = New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries -DontStopIfGoingOnBatteries

Register-ScheduledTask -TaskName "CUA Server" -Action $action -Trigger $trigger -Principal $principal -Settings $settings

Configuration Options

Configure the server for your needs:

# Basic start with default settings
cua-server start

# Custom port
cua-server start --port 8001

# With authentication token (recommended if exposing to network)
cua-server start --port 8000 --auth-token your-secret-token

# Specify display (Linux/Unix)
DISPLAY=:1 cua-server start --port 8000

# Bind to all interfaces (careful - exposes to network!)
cua-server start --bind 0.0.0.0 --port 8000 --auth-token required-if-exposed

Security Considerations

⚠️ Important: By default, the server only listens on localhost (127.0.0.1) for security. This means only processes on your machine can connect to it.

  • Local only (default): Safe for personal use with OpenClaw
  • Network exposure: Only use --bind 0.0.0.0 with proper firewall rules AND authentication
  • Authentication: Always use --auth-token if the server is accessible from the network

Verification

After installation, verify the server is working:

# Check server status
curl http://localhost:8000/status

# List available commands
curl http://localhost:8000/commands | jq

# Take a test screenshot of your desktop
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "screenshot"}' \
  | jq -r '.result.base64' \
  | base64 -d > test-screenshot.png

# View the screenshot
open test-screenshot.png       # macOS
xdg-open test-screenshot.png   # Linux
start test-screenshot.png      # Windows

If you see a screenshot of your current desktop, the server is working correctly!

Troubleshooting

Port Already in Use:

# Check what's using port 8000
lsof -i :8000              # macOS/Linux
netstat -ano | findstr :8000  # Windows

# Solution: Use a different port
cua-server start --port 8001

Permission Denied (Linux):

# You may need to add your user to the input group for keyboard/mouse control
sudo usermod -a -G input $USER
# Log out and back in for changes to take effect

Display Not Found (Linux):

# Check your display variable
echo $DISPLAY

# Set it explicitly
DISPLAY=:0 cua-server start --port 8000

Server Not Responding:

# Check if the process is running
ps aux | grep cua-server       # Linux/macOS
tasklist | findstr cua-server  # Windows

# Try running in foreground to see errors
cua-server start --port 8000 --debug

Available Commands

Take Screenshot

Capture the current screen:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "screenshot"}' \
  | jq -r '.result.base64' \
  | base64 -d > screenshot.png

Click at Coordinates

Click at specific x,y coordinates:

# Click at center of 1280x720 screen
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "left_click", "params": {"x": 640, "y": 360}}'

Right Click

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "right_click", "params": {"x": 640, "y": 360}}'

Double Click

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "double_click", "params": {"x": 640, "y": 360}}'

Type Text

Type text at the current cursor position:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "type_text", "params": {"text": "Hello, World!"}}'

Press Hotkey

Press a key combination:

# Ctrl+C
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "hotkey", "params": {"keys": ["ctrl", "c"]}}'

# Ctrl+Alt+T (open terminal)
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "hotkey", "params": {"keys": ["ctrl", "alt", "t"]}}'

Press Single Key

Press a single key:

# Press Enter
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "press_key", "params": {"key": "enter"}}'

# Press Escape
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "press_key", "params": {"key": "escape"}}'

Move Cursor

Move cursor to specific position:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "move_cursor", "params": {"x": 100, "y": 200}}'

Scroll

Scroll up or down:

# Scroll down 3 units
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "scroll_direction", "params": {"direction": "down", "amount": 3}}'

# Scroll up 5 units
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "scroll_direction", "params": {"direction": "up", "amount": 5}}'

Launch Application

Launch an application by name:

# Launch Firefox
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "launch", "params": {"app": "firefox"}}'

# Launch Terminal
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "launch", "params": {"app": "xfce4-terminal"}}'

Open File or URL

Open a file or URL with default application:

# Open URL
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "open", "params": {"path": "https://example.com"}}'

# Open file
curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "open", "params": {"path": "/home/cua/document.txt"}}'

Get Window Information

Get current window ID:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "get_current_window_id"}'

Window Control

Maximize window:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "maximize_window", "params": {"window_id": "0x1234567"}}'

Minimize window:

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "minimize_window", "params": {"window_id": "0x1234567"}}'

Demo Workflows

Browser Navigation Demo

Open Firefox and navigate to a website:

# Take initial screenshot
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "screenshot"}' -o initial.json

# Launch Firefox
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "launch", "params": {"app": "firefox"}}'
sleep 3

# Focus address bar (Ctrl+L)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "l"]}}'
sleep 1

# Type URL
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "https://example.com"}}'

# Press Enter
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'
sleep 5

# Take final screenshot
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "screenshot"}' -o final.json

Text Editor Demo

Open text editor and type content:

# Open terminal
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "alt", "t"]}}'
sleep 2

# Type command to open text editor
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "mousepad"}}'
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'
sleep 2

# Type some text
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "Hello from OpenClaw!\
This is automated desktop control."}}'

# Save file (Ctrl+S)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "s"]}}'
sleep 1

# Type filename
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "openclaw-demo.txt"}}'
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'

Form Filling Demo

Fill out a web form:

# Assuming browser is open with form visible

# Click on first input field (adjust coordinates)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "left_click", "params": {"x": 400, "y": 300}}'

# Type name
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "John Doe"}}'

# Tab to next field
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "tab"}}'

# Type email
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "[email protected]"}}'

# Tab to next field
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "tab"}}'

# Type message
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "This form was filled automatically by OpenClaw!"}}'

# Submit form (click submit button)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "left_click", "params": {"x": 400, "y": 500}}'

Helper Functions

Check Server Status

curl http://localhost:8000/status

List All Available Commands

curl http://localhost:8000/commands | jq

Get Screen Size

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "get_screen_size"}'

Get Cursor Position

curl -X POST http://localhost:8000/cmd \
  -H "Content-Type: application/json" \
  -d '{"command": "get_cursor_position"}'

Environment Variables

  • CUA_SERVER_URL: Base URL for CUA server (default: http://localhost:8000)

Tips

  1. Wait Between Commands: Add sleep between commands to allow UI to update
  2. Check Coordinates: Screen is 1280x720, center is at (640, 360)
  3. Screenshot for Debugging: Take screenshots before and after actions to verify
  4. Use Variables: Store coordinates and text in variables for reusability

Example OpenClaw Usage

Once this skill is loaded, you can use it in OpenClaw conversations:

User: "Take a screenshot and open Firefox"
OpenClaw: *executes the screenshot and launch firefox commands*

User: "Type 'Hello World' in the current window"
OpenClaw: *executes the type_text command*

User: "Click at the center of the screen"
OpenClaw: *executes click command at 640,360*

Troubleshooting

  1. Connection Refused: Make sure CUA server is running on port 8000
  2. No Response: Check if you're in the container or have SSH tunnel set up
  3. Commands Not Working: Verify with curl http://localhost:8000/status
  4. Wrong Coordinates: Remember screen is 1280x720, adjust coordinates accordingly
安全使用建议
This skill is coherent: it expects you to install a separate CUA server that will actually control your desktop. Before installing or enabling it: (1) inspect the cua-computer-sdk and cua-computer-server source (PyPI package and GitHub repo) for malicious code; (2) avoid binding the server to 0.0.0.0 or exposing it to networks unless you enforce authentication and firewall rules; (3) prefer running the server in an isolated VM or disposable environment, not on your primary machine; (4) only grant the minimum OS permissions required (be cautious about adding to input group or using sudo); and (5) if you must run as a persistent service, configure strong auth tokens and network restrictions. If you cannot audit the third‑party package, treat this as high risk and avoid installing it on sensitive systems.
功能分析
Type: OpenClaw Skill Name: desktop-automation Version: 1.0.0 The skill bundle provides instructions and scripts to install a 'CUA computer server' that grants an AI agent full control over the host desktop, including screenshots, key injection, and application launching. It explicitly includes scripts for establishing persistence via systemd, launchd, and Windows Task Scheduler (SKILL.md). While these capabilities are aligned with the stated purpose of desktop automation, the inherent risk of granting an AI unrestricted UI and command execution access to the host machine, combined with the setup of background services, constitutes high-risk behavior.
能力评估
Purpose & Capability
The name/description promise (control the desktop via a CUA server on port 8000) matches the SKILL.md: it documents installing/running a local cua-server and shows curl calls to /cmd and /status. No unrelated credentials, binaries, or config paths are requested in the registry metadata.
Instruction Scope
The SKILL.md instructs the user to install and run a local server, create systemd/launchd/Task Scheduler entries, and use curl to send commands (screenshot, click, etc.). These instructions are within scope for desktop control but they also require system/service changes (adding user to input group, setting XAUTHORITY, creating services) and enable remote command execution via the server API — which is expected for the stated purpose but is a sensitive capability.
Install Mechanism
The registry contains no install spec (instruction-only), but the document tells users to pip install 'cua-computer-sdk' or git clone 'github.com/trycua/cua-computer-server'. Installing third‑party PyPI packages or running code from GitHub is normal here but introduces supply‑chain risk; the skill itself does not supply vetted binaries.
Credentials
The skill requests no environment variables or credentials in the registry. The SKILL.md references DISPLAY, XAUTHORITY, and $USER which are necessary for desktop control. It recommends using an --auth-token if the server is exposed — sensible and proportional.
Persistence & Privilege
The guide explicitly shows how to install the server as a persistent system service (systemd/launchd/Task Scheduler) and to run it with restart policies. That grants long‑lived background access to the desktop. The skill metadata does not request this privilege, but the instructions enable persistent elevated capability on the host and thus raise security concerns if misused or exposed to a network.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install desktop-automation
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /desktop-automation 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of desktop-control skill. - Control your desktop via the Cua computer server API (default port 8000). - Includes guides for installation, setup as a background service, and security best practices on macOS, Linux, and Windows. - Supports commands such as screenshot, mouse (click, right click, double click, move, scroll), typing text, pressing hotkeys, and more. - Provides detailed troubleshooting and verification instructions. - Requires Cua server installation on your machine for operation. - View more docs on how to use at https://cua.ai/docs/cua/guide/get-started/what-is-cua
元数据
Slug desktop-automation
版本 1.0.0
许可证 MIT-0
累计安装 10
当前安装数 10
历史版本数 1
常见问题

Desktop Automation 是什么?

Control the desktop via CUA computer server API running on port 8000. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 671 次。

如何安装 Desktop Automation?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install desktop-automation」即可一键安装,无需额外配置。

Desktop Automation 是免费的吗?

是的,Desktop Automation 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Desktop Automation 支持哪些平台?

Desktop Automation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Desktop Automation?

由 sarinali(@sarinali)开发并维护,当前版本 v1.0.0。

💬 留言讨论