← 返回 Skills 市场

Gui Control

Name: Gui Control
Author: vibes-me

作者 Kunal Sharma · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

145

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gui-control

功能描述

Control the GUI desktop on this machine using xdotool, scrot, and Firefox. Use when the user asks to open a browser, visit a website, take a screenshot, clic...

使用说明 (SKILL.md)

GUI Control

Control the Linux desktop with a GUI display using shell tools.

Environment

Display: DISPLAY=:1 — ALWAYS prefix all GUI commands with this
This machine has a display — never say "I'm on a headless server"
Tools available: xdotool (keyboard/mouse), scrot (screenshots), firefox

Quick Reference

Open Firefox with a URL

DISPLAY=:1 nohup firefox https://example.com > /dev/null 2>&1 &

Wait for page load before interacting:

sleep 5

Take a Screenshot

DISPLAY=:1 scrot /tmp/screenshot.png

Type Text into Active Window

DISPLAY=:1 xdotool type --delay 50 "Hello world"

Press a Key

DISPLAY=:1 xdotool key Return

Get Active Window Name

DISPLAY=:1 xdotool getactivewindow getwindowname

Close Firefox

DISPLAY=:1 pkill firefox

Workflow: Browse a Website and Interact

Open Firefox with URL: DISPLAY=:1 nohup firefox \x3Curl> > /dev/null 2>&1 &
Wait for load: sleep 5
Take screenshot to verify: DISPLAY=:1 scrot /tmp/step.png
Read screenshot to assess page state
Interact using keyboard (preferred over mouse):
- xdotool key Tab — move focus
- xdotool key Return — submit/confirm
- xdotool type --delay 50 "text" — type into focused field
After each action, screenshot to verify result
Send screenshots to user with the message tool and media parameter

Tips

Prefer keyboard over mouse coordinates — Tab, Enter, arrow keys are more reliable than xdotool mousemove + click
YouTube shortcut: press / to focus the search bar
Always wait after page loads or actions before taking screenshots
Use nohup ... & for launching Firefox so it doesn't block the shell
Send screenshots to user using message(content="...", media=["/tmp/screenshot.png"])

Lessons Learned

Don't Over-Engineer

Start simple — xdotool + keyboard shortcuts work great. Don't jump to Selenium/Marionette unless absolutely needed.
One clean attempt > five messy ones — think before executing, don't retry the same failing approach.
Don't open Firefox multiple times — check if it's already running first with ps aux | grep firefox

Keyboard Shortcuts by Website

YouTube: / focuses search bar, Tab navigates between elements, Return selects
General web: Ctrl+F opens find bar, Ctrl+L focuses address bar, Tab cycles focus
Don't use xdotool mousemove with hardcoded coordinates — they break on different resolutions and you might click the wrong element (e.g., address bar instead of YouTube search)

Common Mistakes to Avoid

❌ Don't guess coordinates — xdotool mousemove 640 120 will click different things on different screens
❌ Don't say "I'm on a headless server" — this machine HAS a display (DISPLAY=:1)
❌ Don't use DISPLAY=:0 — the correct display is :1
❌ Don't open multiple Firefox instances — reuse the existing one or close it first
❌ Don't confuse the browser address bar with website search bars — use keyboard shortcuts to target the right element

Screenshot Workflow

Take screenshot: DISPLAY=:1 scrot /tmp/screen.png
Read it yourself: read_file("/tmp/screen.png") — this lets YOU see the screen
Send to user: message(content="...", media=["/tmp/screen.png"])
Always screenshot AFTER actions to verify results

Gateway + GUI

When running nanobot gateway, always start with DISPLAY=:1 so Telegram/Discord agents can use GUI
The gateway agent has its own context — it won't know about the display unless MEMORY.md says so
Write important system info to MEMORY.md so all channels stay informed

安全使用建议

This skill does what it says: it will open Firefox, simulate keyboard input, take screenshots, and send those screenshots back to the user. Before installing or using it, consider the following: 1) Desktop screenshots can contain sensitive data (passwords, chat windows, private documents). Only run this skill on machines where exposing the screen is acceptable. 2) The SKILL.md tells the agent to persist 'important system info' into MEMORY.md so other channels/agents can see it — remove or disable that behavior if you don't want cross-channel persistence. 3) The skill uses xdotool, scrot, and firefox but does not list them as required in the metadata; ensure those binaries are present and trusted on the host. 4) Test in a controlled environment first (no credentials or private windows visible) and monitor outgoing messages to verify only intended screenshots/data are transmitted. 5) If you need stricter limits, edit the SKILL.md/script to remove automatic read_file/send steps and the MEMORY.md guidance, and require explicit user confirmation before capturing or sending images.

功能分析

Type: OpenClaw Skill Name: gui-control Version: 1.0.0 The skill provides broad GUI automation and monitoring capabilities, including screen capture via 'scrot' and input simulation (typing/keystrokes) via 'xdotool'. While these tools are aligned with the stated purpose of desktop control, they constitute high-risk behaviors as they allow an agent to observe sensitive information on the screen and interact with any running application. The 'scripts/gui-helper.sh' and 'SKILL.md' instructions specifically enable these actions on 'DISPLAY=:1', which could be used for unauthorized data access or interaction if the agent is misdirected.

能力评估

ℹ Purpose & Capability

Name/description (GUI control using xdotool, scrot, Firefox) match the provided script and runtime instructions. However, the SKILL.md asserts availability of xdotool/scrot/firefox but the skill metadata does not declare those binaries as requirements—this mismatch is unexpected but not necessarily malicious.

⚠ Instruction Scope

Instructions explicitly direct the agent to take screenshots, read them (read_file('/tmp/screen.png')), and send them to the user via the message tool. Those actions are within the declared GUI purpose but carry clear privacy/exfiltration risk because desktop screenshots can contain sensitive information. The SKILL.md also tells the agent to write 'important system info' to MEMORY.md so other channels will know the display—this persists system state into agent memory and can expose information across channels, which is beyond what's needed for simple ephemeral GUI control.

✓ Install Mechanism

No install spec (instruction-only + small helper script). No external downloads or archive extraction. The script is simple and its operations are transparent (firefox, scrot, xdotool, pkill, sleep).

ℹ Credentials

The skill declares no required environment variables or credentials, which is consistent with its function. It does, however, insist on using DISPLAY=:1 for all commands—this is reasonable for GUI control but the metadata does not declare this environment dependency explicitly. No other secrets are requested.

⚠ Persistence & Privilege

always:false and disable-model-invocation are normal. The concerning part is the explicit guidance to write system/display info to MEMORY.md so other agents/gateways will know about the display. That encourages persistent storage of system state (and possibly sensitive context) outside the ephemeral interaction, increasing cross-channel exposure risk.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gui-control
安装完成后，直接呼叫该 Skill 的名称或使用 /gui-control 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release - GUI control with xdotool, scrot, and Firefox on DISPLAY=:1

元数据

Slug gui-control

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Gui Control 是什么？

Control the GUI desktop on this machine using xdotool, scrot, and Firefox. Use when the user asks to open a browser, visit a website, take a screenshot, clic... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 145 次。

如何安装 Gui Control？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gui-control」即可一键安装，无需额外配置。

Gui Control 是免费的吗？

是的，Gui Control 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Gui Control 支持哪些平台？

Gui Control 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Gui Control？

由 Kunal Sharma（@vibes-me）开发并维护，当前版本 v1.0.0。