← Back to Skills Marketplace
Gui Control
by
Kunal Sharma
· GitHub ↗
· v1.0.0
· MIT-0
145
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install gui-control
Description
Control the GUI desktop on this machine using xdotool, scrot, and Firefox. Use when the user asks to open a browser, visit a website, take a screenshot, clic...
README (SKILL.md)
GUI Control
Control the Linux desktop with a GUI display using shell tools.
Environment
- Display:
DISPLAY=:1— ALWAYS prefix all GUI commands with this - This machine has a display — never say "I'm on a headless server"
- Tools available:
xdotool(keyboard/mouse),scrot(screenshots),firefox
Quick Reference
Open Firefox with a URL
DISPLAY=:1 nohup firefox https://example.com > /dev/null 2>&1 &
Wait for page load before interacting:
sleep 5
Take a Screenshot
DISPLAY=:1 scrot /tmp/screenshot.png
Type Text into Active Window
DISPLAY=:1 xdotool type --delay 50 "Hello world"
Press a Key
DISPLAY=:1 xdotool key Return
Get Active Window Name
DISPLAY=:1 xdotool getactivewindow getwindowname
Close Firefox
DISPLAY=:1 pkill firefox
Workflow: Browse a Website and Interact
- Open Firefox with URL:
DISPLAY=:1 nohup firefox \x3Curl> > /dev/null 2>&1 & - Wait for load:
sleep 5 - Take screenshot to verify:
DISPLAY=:1 scrot /tmp/step.png - Read screenshot to assess page state
- Interact using keyboard (preferred over mouse):
xdotool key Tab— move focusxdotool key Return— submit/confirmxdotool type --delay 50 "text"— type into focused field
- After each action, screenshot to verify result
- Send screenshots to user with the
messagetool andmediaparameter
Tips
- Prefer keyboard over mouse coordinates — Tab, Enter, arrow keys are more reliable than
xdotool mousemove+click - YouTube shortcut: press
/to focus the search bar - Always wait after page loads or actions before taking screenshots
- Use
nohup ... &for launching Firefox so it doesn't block the shell - Send screenshots to user using
message(content="...", media=["/tmp/screenshot.png"])
Lessons Learned
Don't Over-Engineer
- Start simple —
xdotool+ keyboard shortcuts work great. Don't jump to Selenium/Marionette unless absolutely needed. - One clean attempt > five messy ones — think before executing, don't retry the same failing approach.
- Don't open Firefox multiple times — check if it's already running first with
ps aux | grep firefox
Keyboard Shortcuts by Website
- YouTube:
/focuses search bar,Tabnavigates between elements,Returnselects - General web:
Ctrl+Fopens find bar,Ctrl+Lfocuses address bar,Tabcycles focus - Don't use
xdotool mousemovewith hardcoded coordinates — they break on different resolutions and you might click the wrong element (e.g., address bar instead of YouTube search)
Common Mistakes to Avoid
- ❌ Don't guess coordinates —
xdotool mousemove 640 120will click different things on different screens - ❌ Don't say "I'm on a headless server" — this machine HAS a display (
DISPLAY=:1) - ❌ Don't use
DISPLAY=:0— the correct display is:1 - ❌ Don't open multiple Firefox instances — reuse the existing one or close it first
- ❌ Don't confuse the browser address bar with website search bars — use keyboard shortcuts to target the right element
Screenshot Workflow
- Take screenshot:
DISPLAY=:1 scrot /tmp/screen.png - Read it yourself:
read_file("/tmp/screen.png")— this lets YOU see the screen - Send to user:
message(content="...", media=["/tmp/screen.png"]) - Always screenshot AFTER actions to verify results
Gateway + GUI
- When running
nanobot gateway, always start withDISPLAY=:1so Telegram/Discord agents can use GUI - The gateway agent has its own context — it won't know about the display unless MEMORY.md says so
- Write important system info to MEMORY.md so all channels stay informed
Usage Guidance
This skill does what it says: it will open Firefox, simulate keyboard input, take screenshots, and send those screenshots back to the user. Before installing or using it, consider the following: 1) Desktop screenshots can contain sensitive data (passwords, chat windows, private documents). Only run this skill on machines where exposing the screen is acceptable. 2) The SKILL.md tells the agent to persist 'important system info' into MEMORY.md so other channels/agents can see it — remove or disable that behavior if you don't want cross-channel persistence. 3) The skill uses xdotool, scrot, and firefox but does not list them as required in the metadata; ensure those binaries are present and trusted on the host. 4) Test in a controlled environment first (no credentials or private windows visible) and monitor outgoing messages to verify only intended screenshots/data are transmitted. 5) If you need stricter limits, edit the SKILL.md/script to remove automatic read_file/send steps and the MEMORY.md guidance, and require explicit user confirmation before capturing or sending images.
Capability Analysis
Type: OpenClaw Skill
Name: gui-control
Version: 1.0.0
The skill provides broad GUI automation and monitoring capabilities, including screen capture via 'scrot' and input simulation (typing/keystrokes) via 'xdotool'. While these tools are aligned with the stated purpose of desktop control, they constitute high-risk behaviors as they allow an agent to observe sensitive information on the screen and interact with any running application. The 'scripts/gui-helper.sh' and 'SKILL.md' instructions specifically enable these actions on 'DISPLAY=:1', which could be used for unauthorized data access or interaction if the agent is misdirected.
Capability Assessment
Purpose & Capability
Name/description (GUI control using xdotool, scrot, Firefox) match the provided script and runtime instructions. However, the SKILL.md asserts availability of xdotool/scrot/firefox but the skill metadata does not declare those binaries as requirements—this mismatch is unexpected but not necessarily malicious.
Instruction Scope
Instructions explicitly direct the agent to take screenshots, read them (read_file('/tmp/screen.png')), and send them to the user via the message tool. Those actions are within the declared GUI purpose but carry clear privacy/exfiltration risk because desktop screenshots can contain sensitive information. The SKILL.md also tells the agent to write 'important system info' to MEMORY.md so other channels will know the display—this persists system state into agent memory and can expose information across channels, which is beyond what's needed for simple ephemeral GUI control.
Install Mechanism
No install spec (instruction-only + small helper script). No external downloads or archive extraction. The script is simple and its operations are transparent (firefox, scrot, xdotool, pkill, sleep).
Credentials
The skill declares no required environment variables or credentials, which is consistent with its function. It does, however, insist on using DISPLAY=:1 for all commands—this is reasonable for GUI control but the metadata does not declare this environment dependency explicitly. No other secrets are requested.
Persistence & Privilege
always:false and disable-model-invocation are normal. The concerning part is the explicit guidance to write system/display info to MEMORY.md so other agents/gateways will know about the display. That encourages persistent storage of system state (and possibly sensitive context) outside the ephemeral interaction, increasing cross-channel exposure risk.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install gui-control - After installation, invoke the skill by name or use
/gui-control - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release - GUI control with xdotool, scrot, and Firefox on DISPLAY=:1
Metadata
Frequently Asked Questions
What is Gui Control?
Control the GUI desktop on this machine using xdotool, scrot, and Firefox. Use when the user asks to open a browser, visit a website, take a screenshot, clic... It is an AI Agent Skill for Claude Code / OpenClaw, with 145 downloads so far.
How do I install Gui Control?
Run "/install gui-control" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Gui Control free?
Yes, Gui Control is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Gui Control support?
Gui Control is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Gui Control?
It is built and maintained by Kunal Sharma (@vibes-me); the current version is v1.0.0.
More Skills