← Back to Skills Marketplace
pabloraka

dekstop-control-linux

by PabloRaka · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
281
Downloads
2
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install dekstop-control-linux
Description
Safe Linux desktop automation (mouse/keyboard/screenshot) with approval mode and X11/Wayland checks.
README (SKILL.md)

Desktop Control (Linux)

Safe desktop automation for Linux using PyAutoGUI with explicit approvals and environment checks.

Requirements

  • Linux with GUI session (X11 recommended)
  • Python packages:
    • pyautogui
    • pillow
    • pygetwindow (window ops; not supported on Linux)
    • pyperclip (clipboard ops)
    • opencv-python (optional, image match)

System packages (common):

  • python3-tk, scrot, xclip or xsel
  • wmctrl (window list/activate)
  • xdotool (active window)

Quick Start

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=True)
print(dc.get_screen_size())
PY

Screenshot to file

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
print(dc.screenshot_to('/tmp/screen.png'))
PY

Record screen (ffmpeg)

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
print(dc.record_screen('/tmp/record.mp4', seconds=30))
PY

Launch Chrome + open URL (default wait 15s; use 15–30s for heavy apps)

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
dc.open_chrome('http://localhost:8000', wait_seconds=15)
PY

Preset examples

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

def preset_open_site():
    dc.open_chrome('http://localhost:8000', wait_seconds=15)

def preset_login_site():
    dc.open_chrome('http://localhost:8000/login', wait_seconds=15)
    dc.login_form('[email protected]', 'password', wait_seconds=10)

dc.register_preset('open-site', preset_open_site)
dc.register_preset('login-site', preset_login_site)

# run presets
# dc.run_preset('open-site')
# dc.run_preset('login-site')
PY

Workflow (DSL) example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
steps = [
  {"action": "open_chrome", "url": "http://localhost:8000/login", "wait": 15},
  {"action": "login_form", "email": "[email protected]", "password": "secret", "wait": 10},
  {"action": "open_url", "url": "http://localhost:8000/target", "wait": 15},
  {"action": "screenshot", "path": "/tmp/target.png"}
]

dc.run_steps(steps)
PY

OCR & State Detection example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Read text from screen
text = dc.read_text_on_screen()
print(text)

# Wait for text to appear (requires pytesseract)
if dc.wait_for_text("Success", timeout=30):
    print("Text detected!")
PY

Multi-monitor example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Get all monitors
monitors = dc.get_monitors()
print(monitors)  # [{'name': 'HDMI-1', 'x': 0, 'y': 0, 'width': 1920, 'height': 1080}, ...]

# Click on second monitor (relative 0.5, 0.5 = center)
dc.click_monitor(1, 0.5, 0.5)
PY

Multi-browser example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Open different browsers
dc.open_firefox('https://google.com', wait_seconds=15)
dc.open_edge('https://github.com', wait_seconds=15)
PY

Window Manager example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Resize window to 800x600
dc.resize_window('Chrome', 800, 600)

# Minimize window
dc.minimize_window('Telegram')

# Maximize window
dc.maximize_window('VSCode')
PY

Flow Recorder example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Start recording
dc.start_recording()

# Do some actions (manual for now, or wrap them)
dc.click(x=100, y=200)
dc.type_text('hello')
dc.press('enter')

# Stop and replay
actions = dc.stop_recording()
print(f"Recorded {len(actions)} actions")

# Replay later
dc.replay_actions(actions, delay_multiplier=1.0)
PY

AI Vision & Smart Wait example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Find element by color (RGB)
pos = dc.find_element_by_color((255, 0, 0), tolerance=20)  # red
if pos:
    dc.click(x=pos[0], y=pos[1])

# Smart wait - poll until condition is true
dc.smart_wait(lambda: dc.active_window_contains('Done'), timeout=30)
PY

Drag & Drop example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Drag from point A to B
dc.drag_drop(100, 200, 500, 600)

# Drag file to app
dc.drag_file_to_app('/path/to/file.txt', 400, 300)
PY

Robust retry example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Click with automatic retry
dc.robust_click(100, 200)

# Type with automatic retry
dc.robust_type("Hello world")
PY

API

Same interface as DesktopController:

  • mouse: move_mouse, click, drag, scroll, get_mouse_position
  • keyboard: type_text, press, hotkey, wait, launch_app, open_url, open_chrome, wait_retry_window, wait_retry_new_window, smart_retry
  • screen/ui: click_image, click_image_or, login_form
  • state: ensure_window, active_window_contains, wait_for_text, detect_state
  • recovery: recover_reload, recover_back, retry_with_recovery
  • workflows: run_steps
  • presets: register_preset, run_preset
  • ocr: read_text_on_screen
  • multi-monitor: get_monitors, click_monitor
  • robust: robust_click, robust_type
  • smart-wait: smart_wait, wait_for_window_stable
  • drag-drop: drag_drop, drag_file_to_app
  • window-manager: resize_window, minimize_window, maximize_window
  • multi-browser: open_firefox, open_edge
  • keyboard: detect_keyboard_layout
  • ai-vision: find_element_by_color, find_button_vision
  • recorder: start_recording, record_action, stop_recording, replay_actions

launch_app(app_name, wait_seconds=15, window_title=None, auto_detect_window=True)

  • If window_title is provided: waits 15s, retries once, then errors if not found.
  • If auto_detect_window=True: detects a new window title automatically, waits 15s, retries once.

smart_retry(action_fn, check_fn, wait_seconds=15, retries=2)

  • Runs action → wait → check → retry (with wait) to avoid rapid loops.
  • screen: screenshot, screenshot_to, record_screen, get_pixel_color, find_on_screen
  • windows: get_all_windows, activate_window, focus_window_or_click, get_active_window
  • clipboard: copy_to_clipboard, get_from_clipboard

Safety

  • Approval mode enabled by default
  • Failsafe: move mouse to any corner to abort
  • Environment guard: warns on Wayland or headless sessions
  • Auto-detect DISPLAY: tries /tmp/.X11-unix when DISPLAY is missing
Usage Guidance
This skill does what it says: programmatic control of your Linux desktop. Before installing or enabling it, consider: (1) Only run it on machines you trust — it can capture screenshots, record the screen, read text (OCR), and type arbitrarily. (2) Keep require_approval=True unless you explicitly want automated/unattended control; examples that set require_approval=False will let the agent act without interactive confirmation. (3) Avoid embedding real credentials in presets or workflow steps you register with the skill; the skill will type whatever you give it and can replay recorded actions. (4) Review and install only the Python/system packages you trust (pyautogui, ffmpeg, wmctrl/xdotool, etc.). (5) If you want to limit risk, disable autonomous model invocation for this skill or restrict its use to supervised sessions.
Capability Analysis
Type: OpenClaw Skill Name: dekstop-control-linux Version: 1.0.0 The skill bundle provides extensive desktop automation capabilities for Linux, including full mouse/keyboard control, screen recording via ffmpeg, clipboard access, and OCR. Key indicators of risk include a 'login_form' helper in '__init__.py' designed to automate credential entry and the ability to bypass user approval ('require_approval=False'), which is frequently demonstrated in the 'SKILL.md' examples. While these features align with the stated goal of desktop control, they grant the AI agent high-privilege access to the user's graphical session and sensitive data without robust safeguards against misuse or data exfiltration.
Capability Assessment
Purpose & Capability
Name/description match the code and SKILL.md. The code implements mouse/keyboard/screenshot/recording/ocr/window ops and includes environment checks for X11/Wayland. No unrelated credentials, config paths, or unexpected binaries are demanded.
Instruction Scope
SKILL.md and the code focus on GUI automation and include reading screen contents (OCR), taking screenshots, recording, and reading /tmp/.X11-unix to detect DISPLAY. Examples show supplying credentials to login_form and running without approval (require_approval=False). The instructions do not direct data to external endpoints, but the skill can capture sensitive on-screen content and interact with apps, so its scope is broad by design.
Install Mechanism
No install spec is present (instruction-only skill with an included Python module). No downloads or external installers are embedded. Runtime does require common Python packages (pyautogui, pillow, etc.) and system utilities (scrot, xclip, wmctrl, xdotool, ffmpeg) which are reasonable for the declared functionality.
Credentials
The skill requires no environment variables or secrets. It does access environment state (DISPLAY, WAYLAND_DISPLAY, XDG_SESSION_TYPE) and filesystem paths such as /tmp/.X11-unix. It can type provided passwords and read screen/clipboard contents — appropriate for automation but sensitive in practice. The declared requirements align with the functionality.
Persistence & Privilege
always:false and no code modifies other skills. However, the skill supports running with require_approval=False; combined with the platform default that allows model invocation, an agent could autonomously execute GUI actions (open apps, type, take screenshots). This is not an incoherence but is an important operational risk to consider.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install dekstop-control-linux
  3. After installation, invoke the skill by name or use /dekstop-control-linux
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of dekstop-control-linux — safe Linux desktop automation with approval mode and X11/Wayland checks. - Provides desktop automation (mouse, keyboard, screenshots) for Linux with PyAutoGUI. - Includes explicit approval mode for safety and environment/session checks. - Supports multi-monitor, multi-browser, window management, drag-and-drop, robust retry, OCR, AI vision, recorder, and workflows. - Extensive API examples for launching apps, screen recording, preset workflows, smart waiting, and more. - Safety features: approval mode enabled by default, failsafe abort, and environment compatibility checks.
Metadata
Slug dekstop-control-linux
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is dekstop-control-linux?

Safe Linux desktop automation (mouse/keyboard/screenshot) with approval mode and X11/Wayland checks. It is an AI Agent Skill for Claude Code / OpenClaw, with 281 downloads so far.

How do I install dekstop-control-linux?

Run "/install dekstop-control-linux" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is dekstop-control-linux free?

Yes, dekstop-control-linux is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does dekstop-control-linux support?

dekstop-control-linux is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created dekstop-control-linux?

It is built and maintained by PabloRaka (@pabloraka); the current version is v1.0.0.

💬 Comments