功能描述

Safe Linux desktop automation (mouse/keyboard/screenshot) with approval mode and X11/Wayland checks.

使用说明 (SKILL.md)

Desktop Control (Linux)

Name: dekstop-control-linux
Author: pabloraka

Safe desktop automation for Linux using PyAutoGUI with explicit approvals and environment checks.

Requirements

Linux with GUI session (X11 recommended)
Python packages:
- pyautogui
- pillow
- pygetwindow (window ops; not supported on Linux)
- pyperclip (clipboard ops)
- opencv-python (optional, image match)

System packages (common):

python3-tk, scrot, xclip or xsel
wmctrl (window list/activate)
xdotool (active window)

Quick Start

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=True)
print(dc.get_screen_size())
PY

Screenshot to file

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
print(dc.screenshot_to('/tmp/screen.png'))
PY

Record screen (ffmpeg)

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
print(dc.record_screen('/tmp/record.mp4', seconds=30))
PY

Launch Chrome + open URL (default wait 15s; use 15–30s for heavy apps)

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
dc.open_chrome('http://localhost:8000', wait_seconds=15)
PY

Preset examples

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

def preset_open_site():
    dc.open_chrome('http://localhost:8000', wait_seconds=15)

def preset_login_site():
    dc.open_chrome('http://localhost:8000/login', wait_seconds=15)
    dc.login_form('[email protected]', 'password', wait_seconds=10)

dc.register_preset('open-site', preset_open_site)
dc.register_preset('login-site', preset_login_site)

# run presets
# dc.run_preset('open-site')
# dc.run_preset('login-site')
PY

Workflow (DSL) example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)
steps = [
  {"action": "open_chrome", "url": "http://localhost:8000/login", "wait": 15},
  {"action": "login_form", "email": "[email protected]", "password": "secret", "wait": 10},
  {"action": "open_url", "url": "http://localhost:8000/target", "wait": 15},
  {"action": "screenshot", "path": "/tmp/target.png"}
]

dc.run_steps(steps)
PY

OCR & State Detection example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Read text from screen
text = dc.read_text_on_screen()
print(text)

# Wait for text to appear (requires pytesseract)
if dc.wait_for_text("Success", timeout=30):
    print("Text detected!")
PY

Multi-monitor example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Get all monitors
monitors = dc.get_monitors()
print(monitors)  # [{'name': 'HDMI-1', 'x': 0, 'y': 0, 'width': 1920, 'height': 1080}, ...]

# Click on second monitor (relative 0.5, 0.5 = center)
dc.click_monitor(1, 0.5, 0.5)
PY

Multi-browser example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Open different browsers
dc.open_firefox('https://google.com', wait_seconds=15)
dc.open_edge('https://github.com', wait_seconds=15)
PY

Window Manager example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Resize window to 800x600
dc.resize_window('Chrome', 800, 600)

# Minimize window
dc.minimize_window('Telegram')

# Maximize window
dc.maximize_window('VSCode')
PY

Flow Recorder example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Start recording
dc.start_recording()

# Do some actions (manual for now, or wrap them)
dc.click(x=100, y=200)
dc.type_text('hello')
dc.press('enter')

# Stop and replay
actions = dc.stop_recording()
print(f"Recorded {len(actions)} actions")

# Replay later
dc.replay_actions(actions, delay_multiplier=1.0)
PY

AI Vision & Smart Wait example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Find element by color (RGB)
pos = dc.find_element_by_color((255, 0, 0), tolerance=20)  # red
if pos:
    dc.click(x=pos[0], y=pos[1])

# Smart wait - poll until condition is true
dc.smart_wait(lambda: dc.active_window_contains('Done'), timeout=30)
PY

Drag & Drop example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Drag from point A to B
dc.drag_drop(100, 200, 500, 600)

# Drag file to app
dc.drag_file_to_app('/path/to/file.txt', 400, 300)
PY

Robust retry example

python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux

dc = DesktopControllerLinux(require_approval=False)

# Click with automatic retry
dc.robust_click(100, 200)

# Type with automatic retry
dc.robust_type("Hello world")
PY

API

Same interface as DesktopController:

mouse: move_mouse, click, drag, scroll, get_mouse_position
keyboard: type_text, press, hotkey, wait, launch_app, open_url, open_chrome, wait_retry_window, wait_retry_new_window, smart_retry
screen/ui: click_image, click_image_or, login_form
state: ensure_window, active_window_contains, wait_for_text, detect_state
recovery: recover_reload, recover_back, retry_with_recovery
workflows: run_steps
presets: register_preset, run_preset
ocr: read_text_on_screen
multi-monitor: get_monitors, click_monitor
robust: robust_click, robust_type
smart-wait: smart_wait, wait_for_window_stable
drag-drop: drag_drop, drag_file_to_app
window-manager: resize_window, minimize_window, maximize_window
multi-browser: open_firefox, open_edge
keyboard: detect_keyboard_layout
ai-vision: find_element_by_color, find_button_vision
recorder: start_recording, record_action, stop_recording, replay_actions

launch_app(app_name, wait_seconds=15, window_title=None, auto_detect_window=True)

If window_title is provided: waits 15s, retries once, then errors if not found.
If auto_detect_window=True: detects a new window title automatically, waits 15s, retries once.

smart_retry(action_fn, check_fn, wait_seconds=15, retries=2)

Runs action → wait → check → retry (with wait) to avoid rapid loops.
screen: screenshot, screenshot_to, record_screen, get_pixel_color, find_on_screen
windows: get_all_windows, activate_window, focus_window_or_click, get_active_window
clipboard: copy_to_clipboard, get_from_clipboard

Safety

Approval mode enabled by default
Failsafe: move mouse to any corner to abort
Environment guard: warns on Wayland or headless sessions
Auto-detect DISPLAY: tries /tmp/.X11-unix when DISPLAY is missing

安全使用建议

This skill does what it says: programmatic control of your Linux desktop. Before installing or enabling it, consider: (1) Only run it on machines you trust — it can capture screenshots, record the screen, read text (OCR), and type arbitrarily. (2) Keep require_approval=True unless you explicitly want automated/unattended control; examples that set require_approval=False will let the agent act without interactive confirmation. (3) Avoid embedding real credentials in presets or workflow steps you register with the skill; the skill will type whatever you give it and can replay recorded actions. (4) Review and install only the Python/system packages you trust (pyautogui, ffmpeg, wmctrl/xdotool, etc.). (5) If you want to limit risk, disable autonomous model invocation for this skill or restrict its use to supervised sessions.

功能分析

Type: OpenClaw Skill Name: dekstop-control-linux Version: 1.0.0 The skill bundle provides extensive desktop automation capabilities for Linux, including full mouse/keyboard control, screen recording via ffmpeg, clipboard access, and OCR. Key indicators of risk include a 'login_form' helper in '__init__.py' designed to automate credential entry and the ability to bypass user approval ('require_approval=False'), which is frequently demonstrated in the 'SKILL.md' examples. While these features align with the stated goal of desktop control, they grant the AI agent high-privilege access to the user's graphical session and sensitive data without robust safeguards against misuse or data exfiltration.

能力评估

✓ Purpose & Capability

Name/description match the code and SKILL.md. The code implements mouse/keyboard/screenshot/recording/ocr/window ops and includes environment checks for X11/Wayland. No unrelated credentials, config paths, or unexpected binaries are demanded.

ℹ Instruction Scope

SKILL.md and the code focus on GUI automation and include reading screen contents (OCR), taking screenshots, recording, and reading /tmp/.X11-unix to detect DISPLAY. Examples show supplying credentials to login_form and running without approval (require_approval=False). The instructions do not direct data to external endpoints, but the skill can capture sensitive on-screen content and interact with apps, so its scope is broad by design.

✓ Install Mechanism

No install spec is present (instruction-only skill with an included Python module). No downloads or external installers are embedded. Runtime does require common Python packages (pyautogui, pillow, etc.) and system utilities (scrot, xclip, wmctrl, xdotool, ffmpeg) which are reasonable for the declared functionality.

ℹ Credentials

The skill requires no environment variables or secrets. It does access environment state (DISPLAY, WAYLAND_DISPLAY, XDG_SESSION_TYPE) and filesystem paths such as /tmp/.X11-unix. It can type provided passwords and read screen/clipboard contents — appropriate for automation but sensitive in practice. The declared requirements align with the functionality.

ℹ Persistence & Privilege

always:false and no code modifies other skills. However, the skill supports running with require_approval=False; combined with the platform default that allows model invocation, an agent could autonomously execute GUI actions (open apps, type, take screenshots). This is not an incoherence but is an important operational risk to consider.

版本历史

v1.0.0

Initial release of dekstop-control-linux — safe Linux desktop automation with approval mode and X11/Wayland checks. - Provides desktop automation (mouse, keyboard, screenshots) for Linux with PyAutoGUI. - Includes explicit approval mode for safety and environment/session checks. - Supports multi-monitor, multi-browser, window management, drag-and-drop, robust retry, OCR, AI vision, recorder, and workflows. - Extensive API examples for launching apps, screen recording, preset workflows, smart waiting, and more. - Safety features: approval mode enabled by default, failsafe abort, and environment compatibility checks.

元数据

Slug dekstop-control-linux

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

dekstop-control-linux 是什么？

Safe Linux desktop automation (mouse/keyboard/screenshot) with approval mode and X11/Wayland checks. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 281 次。

如何安装 dekstop-control-linux？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dekstop-control-linux」即可一键安装，无需额外配置。

dekstop-control-linux 是免费的吗？

是的，dekstop-control-linux 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

dekstop-control-linux 支持哪些平台？

dekstop-control-linux 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 dekstop-control-linux？

由 PabloRaka（@pabloraka）开发并维护，当前版本 v1.0.0。

dekstop-control-linux