← 返回 Skills 市场
281
总下载
2
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install dekstop-control-linux
功能描述
Safe Linux desktop automation (mouse/keyboard/screenshot) with approval mode and X11/Wayland checks.
使用说明 (SKILL.md)
Desktop Control (Linux)
Safe desktop automation for Linux using PyAutoGUI with explicit approvals and environment checks.
Requirements
- Linux with GUI session (X11 recommended)
- Python packages:
pyautoguipillowpygetwindow(window ops; not supported on Linux)pyperclip(clipboard ops)opencv-python(optional, image match)
System packages (common):
python3-tk,scrot,xcliporxselwmctrl(window list/activate)xdotool(active window)
Quick Start
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=True)
print(dc.get_screen_size())
PY
Screenshot to file
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
print(dc.screenshot_to('/tmp/screen.png'))
PY
Record screen (ffmpeg)
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
print(dc.record_screen('/tmp/record.mp4', seconds=30))
PY
Launch Chrome + open URL (default wait 15s; use 15–30s for heavy apps)
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
dc.open_chrome('http://localhost:8000', wait_seconds=15)
PY
Preset examples
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
def preset_open_site():
dc.open_chrome('http://localhost:8000', wait_seconds=15)
def preset_login_site():
dc.open_chrome('http://localhost:8000/login', wait_seconds=15)
dc.login_form('[email protected]', 'password', wait_seconds=10)
dc.register_preset('open-site', preset_open_site)
dc.register_preset('login-site', preset_login_site)
# run presets
# dc.run_preset('open-site')
# dc.run_preset('login-site')
PY
Workflow (DSL) example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
steps = [
{"action": "open_chrome", "url": "http://localhost:8000/login", "wait": 15},
{"action": "login_form", "email": "[email protected]", "password": "secret", "wait": 10},
{"action": "open_url", "url": "http://localhost:8000/target", "wait": 15},
{"action": "screenshot", "path": "/tmp/target.png"}
]
dc.run_steps(steps)
PY
OCR & State Detection example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Read text from screen
text = dc.read_text_on_screen()
print(text)
# Wait for text to appear (requires pytesseract)
if dc.wait_for_text("Success", timeout=30):
print("Text detected!")
PY
Multi-monitor example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Get all monitors
monitors = dc.get_monitors()
print(monitors) # [{'name': 'HDMI-1', 'x': 0, 'y': 0, 'width': 1920, 'height': 1080}, ...]
# Click on second monitor (relative 0.5, 0.5 = center)
dc.click_monitor(1, 0.5, 0.5)
PY
Multi-browser example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Open different browsers
dc.open_firefox('https://google.com', wait_seconds=15)
dc.open_edge('https://github.com', wait_seconds=15)
PY
Window Manager example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Resize window to 800x600
dc.resize_window('Chrome', 800, 600)
# Minimize window
dc.minimize_window('Telegram')
# Maximize window
dc.maximize_window('VSCode')
PY
Flow Recorder example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Start recording
dc.start_recording()
# Do some actions (manual for now, or wrap them)
dc.click(x=100, y=200)
dc.type_text('hello')
dc.press('enter')
# Stop and replay
actions = dc.stop_recording()
print(f"Recorded {len(actions)} actions")
# Replay later
dc.replay_actions(actions, delay_multiplier=1.0)
PY
AI Vision & Smart Wait example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Find element by color (RGB)
pos = dc.find_element_by_color((255, 0, 0), tolerance=20) # red
if pos:
dc.click(x=pos[0], y=pos[1])
# Smart wait - poll until condition is true
dc.smart_wait(lambda: dc.active_window_contains('Done'), timeout=30)
PY
Drag & Drop example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Drag from point A to B
dc.drag_drop(100, 200, 500, 600)
# Drag file to app
dc.drag_file_to_app('/path/to/file.txt', 400, 300)
PY
Robust retry example
python - \x3C\x3C'PY'
from skills.desktop_control_linux import DesktopControllerLinux
dc = DesktopControllerLinux(require_approval=False)
# Click with automatic retry
dc.robust_click(100, 200)
# Type with automatic retry
dc.robust_type("Hello world")
PY
API
Same interface as DesktopController:
- mouse:
move_mouse,click,drag,scroll,get_mouse_position - keyboard:
type_text,press,hotkey,wait,launch_app,open_url,open_chrome,wait_retry_window,wait_retry_new_window,smart_retry - screen/ui:
click_image,click_image_or,login_form - state:
ensure_window,active_window_contains,wait_for_text,detect_state - recovery:
recover_reload,recover_back,retry_with_recovery - workflows:
run_steps - presets:
register_preset,run_preset - ocr:
read_text_on_screen - multi-monitor:
get_monitors,click_monitor - robust:
robust_click,robust_type - smart-wait:
smart_wait,wait_for_window_stable - drag-drop:
drag_drop,drag_file_to_app - window-manager:
resize_window,minimize_window,maximize_window - multi-browser:
open_firefox,open_edge - keyboard:
detect_keyboard_layout - ai-vision:
find_element_by_color,find_button_vision - recorder:
start_recording,record_action,stop_recording,replay_actions
launch_app(app_name, wait_seconds=15, window_title=None, auto_detect_window=True)
- If
window_titleis provided: waits 15s, retries once, then errors if not found. - If
auto_detect_window=True: detects a new window title automatically, waits 15s, retries once.
smart_retry(action_fn, check_fn, wait_seconds=15, retries=2)
- Runs action → wait → check → retry (with wait) to avoid rapid loops.
- screen:
screenshot,screenshot_to,record_screen,get_pixel_color,find_on_screen - windows:
get_all_windows,activate_window,focus_window_or_click,get_active_window - clipboard:
copy_to_clipboard,get_from_clipboard
Safety
- Approval mode enabled by default
- Failsafe: move mouse to any corner to abort
- Environment guard: warns on Wayland or headless sessions
- Auto-detect DISPLAY: tries
/tmp/.X11-unixwhen DISPLAY is missing
安全使用建议
This skill does what it says: programmatic control of your Linux desktop. Before installing or enabling it, consider: (1) Only run it on machines you trust — it can capture screenshots, record the screen, read text (OCR), and type arbitrarily. (2) Keep require_approval=True unless you explicitly want automated/unattended control; examples that set require_approval=False will let the agent act without interactive confirmation. (3) Avoid embedding real credentials in presets or workflow steps you register with the skill; the skill will type whatever you give it and can replay recorded actions. (4) Review and install only the Python/system packages you trust (pyautogui, ffmpeg, wmctrl/xdotool, etc.). (5) If you want to limit risk, disable autonomous model invocation for this skill or restrict its use to supervised sessions.
功能分析
Type: OpenClaw Skill
Name: dekstop-control-linux
Version: 1.0.0
The skill bundle provides extensive desktop automation capabilities for Linux, including full mouse/keyboard control, screen recording via ffmpeg, clipboard access, and OCR. Key indicators of risk include a 'login_form' helper in '__init__.py' designed to automate credential entry and the ability to bypass user approval ('require_approval=False'), which is frequently demonstrated in the 'SKILL.md' examples. While these features align with the stated goal of desktop control, they grant the AI agent high-privilege access to the user's graphical session and sensitive data without robust safeguards against misuse or data exfiltration.
能力评估
Purpose & Capability
Name/description match the code and SKILL.md. The code implements mouse/keyboard/screenshot/recording/ocr/window ops and includes environment checks for X11/Wayland. No unrelated credentials, config paths, or unexpected binaries are demanded.
Instruction Scope
SKILL.md and the code focus on GUI automation and include reading screen contents (OCR), taking screenshots, recording, and reading /tmp/.X11-unix to detect DISPLAY. Examples show supplying credentials to login_form and running without approval (require_approval=False). The instructions do not direct data to external endpoints, but the skill can capture sensitive on-screen content and interact with apps, so its scope is broad by design.
Install Mechanism
No install spec is present (instruction-only skill with an included Python module). No downloads or external installers are embedded. Runtime does require common Python packages (pyautogui, pillow, etc.) and system utilities (scrot, xclip, wmctrl, xdotool, ffmpeg) which are reasonable for the declared functionality.
Credentials
The skill requires no environment variables or secrets. It does access environment state (DISPLAY, WAYLAND_DISPLAY, XDG_SESSION_TYPE) and filesystem paths such as /tmp/.X11-unix. It can type provided passwords and read screen/clipboard contents — appropriate for automation but sensitive in practice. The declared requirements align with the functionality.
Persistence & Privilege
always:false and no code modifies other skills. However, the skill supports running with require_approval=False; combined with the platform default that allows model invocation, an agent could autonomously execute GUI actions (open apps, type, take screenshots). This is not an incoherence but is an important operational risk to consider.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install dekstop-control-linux - 安装完成后,直接呼叫该 Skill 的名称或使用
/dekstop-control-linux触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of dekstop-control-linux — safe Linux desktop automation with approval mode and X11/Wayland checks.
- Provides desktop automation (mouse, keyboard, screenshots) for Linux with PyAutoGUI.
- Includes explicit approval mode for safety and environment/session checks.
- Supports multi-monitor, multi-browser, window management, drag-and-drop, robust retry, OCR, AI vision, recorder, and workflows.
- Extensive API examples for launching apps, screen recording, preset workflows, smart waiting, and more.
- Safety features: approval mode enabled by default, failsafe abort, and environment compatibility checks.
元数据
常见问题
dekstop-control-linux 是什么?
Safe Linux desktop automation (mouse/keyboard/screenshot) with approval mode and X11/Wayland checks. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 281 次。
如何安装 dekstop-control-linux?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install dekstop-control-linux」即可一键安装,无需额外配置。
dekstop-control-linux 是免费的吗?
是的,dekstop-control-linux 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
dekstop-control-linux 支持哪些平台?
dekstop-control-linux 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 dekstop-control-linux?
由 PabloRaka(@pabloraka)开发并维护,当前版本 v1.0.0。
推荐 Skills