← Back to Skills Marketplace
quincygunter

desktop-control

by QuincyGunter · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
79
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install abe-desktop-control
Description
Advanced desktop automation with mouse, keyboard, and screen control
README (SKILL.md)

\r \r

Desktop Control Skill\r

\r The most advanced desktop automation skill for SkillBoss. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.\r \r

🎯 Features\r

\r

Mouse Control\r

  • Absolute positioning - Move to exact coordinates\r
  • Relative movement - Move from current position\r
  • Smooth movement - Natural, human-like mouse paths\r
  • Click types - Left, right, middle, double, triple clicks\r
  • Drag & drop - Drag from point A to point B\r
  • Scroll - Vertical and horizontal scrolling\r
  • Position tracking - Get current mouse coordinates\r \r

Keyboard Control\r

  • Text typing - Fast, accurate text input\r
  • Hotkeys - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.)\r
  • Special keys - Enter, Tab, Escape, Arrow keys, F-keys\r
  • Key combinations - Multi-key press combinations\r
  • Hold & release - Manual key state control\r
  • Typing speed - Configurable WPM (instant to human-like)\r \r

Screen Operations\r

  • Screenshot - Capture entire screen or regions\r
  • Image recognition - Find elements on screen (via OpenCV)\r
  • Color detection - Get pixel colors at coordinates\r
  • Multi-monitor - Support for multiple displays\r \r

Window Management\r

  • Window list - Get all open windows\r
  • Activate window - Bring window to front\r
  • Window info - Get position, size, title\r
  • Minimize/Maximize - Control window states\r \r

Safety Features\r

  • Failsafe - Move mouse to corner to abort\r
  • Pause control - Emergency stop mechanism\r
  • Approval mode - Require confirmation for actions\r
  • Bounds checking - Prevent out-of-screen operations\r
  • Logging - Track all automation actions\r \r ---\r \r

🚀 Quick Start\r

\r

Installation\r

\r First, install required dependencies:\r \r

pip install pyautogui pillow opencv-python pygetwindow\r
```\r
\r
### Basic Usage\r
\r
```python\r
from skills.desktop_control import DesktopController\r
\r
# Initialize controller\r
dc = DesktopController(failsafe=True)\r
\r
# Mouse operations\r
dc.move_mouse(500, 300)  # Move to coordinates\r
dc.click()  # Left click at current position\r
dc.click(100, 200, button="right")  # Right click at position\r
\r
# Keyboard operations\r
dc.type_text("Hello from SkillBoss!")\r
dc.hotkey("ctrl", "c")  # Copy\r
dc.press("enter")\r
\r
# Screen operations\r
screenshot = dc.screenshot()\r
position = dc.get_mouse_position()\r
```\r
\r
---\r
\r
## 📋 Complete API Reference\r
\r
### Mouse Functions\r
\r
#### `move_mouse(x, y, duration=0, smooth=True)`\r
Move mouse to absolute screen coordinates.\r
\r
**Parameters:**\r
- `x` (int): X coordinate (pixels from left)\r
- `y` (int): Y coordinate (pixels from top)\r
- `duration` (float): Movement time in seconds (0 = instant, 0.5 = smooth)\r
- `smooth` (bool): Use bezier curve for natural movement\r
\r
**Example:**\r
```python\r
# Instant movement\r
dc.move_mouse(1000, 500)\r
\r
# Smooth 1-second movement\r
dc.move_mouse(1000, 500, duration=1.0)\r
```\r
\r
#### `move_relative(x_offset, y_offset, duration=0)`\r
Move mouse relative to current position.\r
\r
**Parameters:**\r
- `x_offset` (int): Pixels to move horizontally (positive = right)\r
- `y_offset` (int): Pixels to move vertically (positive = down)\r
- `duration` (float): Movement time in seconds\r
\r
**Example:**\r
```python\r
# Move 100px right, 50px down\r
dc.move_relative(100, 50, duration=0.3)\r
```\r
\r
#### `click(x=None, y=None, button='left', clicks=1, interval=0.1)`\r
Perform mouse click.\r
\r
**Parameters:**\r
- `x, y` (int, optional): Coordinates to click (None = current position)\r
- `button` (str): 'left', 'right', 'middle'\r
- `clicks` (int): Number of clicks (1 = single, 2 = double)\r
- `interval` (float): Delay between multiple clicks\r
\r
**Example:**\r
```python\r
# Simple left click\r
dc.click()\r
\r
# Double-click at specific position\r
dc.click(500, 300, clicks=2)\r
\r
# Right-click\r
dc.click(button='right')\r
```\r
\r
#### `drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')`\r
Drag and drop operation.\r
\r
**Parameters:**\r
- `start_x, start_y` (int): Starting coordinates\r
- `end_x, end_y` (int): Ending coordinates\r
- `duration` (float): Drag duration\r
- `button` (str): Mouse button to use\r
\r
**Example:**\r
```python\r
# Drag file from desktop to folder\r
dc.drag(100, 100, 500, 500, duration=1.0)\r
```\r
\r
#### `scroll(clicks, direction='vertical', x=None, y=None)`\r
Scroll mouse wheel.\r
\r
**Parameters:**\r
- `clicks` (int): Scroll amount (positive = up/left, negative = down/right)\r
- `direction` (str): 'vertical' or 'horizontal'\r
- `x, y` (int, optional): Position to scroll at\r
\r
**Example:**\r
```python\r
# Scroll down 5 clicks\r
dc.scroll(-5)\r
\r
# Scroll up 10 clicks\r
dc.scroll(10)\r
\r
# Horizontal scroll\r
dc.scroll(5, direction='horizontal')\r
```\r
\r
#### `get_mouse_position()`\r
Get current mouse coordinates.\r
\r
**Returns:** `(x, y)` tuple\r
\r
**Example:**\r
```python\r
x, y = dc.get_mouse_position()\r
print(f"Mouse is at: {x}, {y}")\r
```\r
\r
---\r
\r
### Keyboard Functions\r
\r
#### `type_text(text, interval=0, wpm=None)`\r
Type text with configurable speed.\r
\r
**Parameters:**\r
- `text` (str): Text to type\r
- `interval` (float): Delay between keystrokes (0 = instant)\r
- `wpm` (int, optional): Words per minute (overrides interval)\r
\r
**Example:**\r
```python\r
# Instant typing\r
dc.type_text("Hello World")\r
\r
# Human-like typing at 60 WPM\r
dc.type_text("Hello World", wpm=60)\r
\r
# Slow typing with 0.1s between keys\r
dc.type_text("Hello World", interval=0.1)\r
```\r
\r
#### `press(key, presses=1, interval=0.1)`\r
Press and release a key.\r
\r
**Parameters:**\r
- `key` (str): Key name (see Key Names section)\r
- `presses` (int): Number of times to press\r
- `interval` (float): Delay between presses\r
\r
**Example:**\r
```python\r
# Press Enter\r
dc.press('enter')\r
\r
# Press Space 3 times\r
dc.press('space', presses=3)\r
\r
# Press Down arrow\r
dc.press('down')\r
```\r
\r
#### `hotkey(*keys, interval=0.05)`\r
Execute keyboard shortcut.\r
\r
**Parameters:**\r
- `*keys` (str): Keys to press together\r
- `interval` (float): Delay between key presses\r
\r
**Example:**\r
```python\r
# Copy (Ctrl+C)\r
dc.hotkey('ctrl', 'c')\r
\r
# Paste (Ctrl+V)\r
dc.hotkey('ctrl', 'v')\r
\r
# Open Run dialog (Win+R)\r
dc.hotkey('win', 'r')\r
\r
# Save (Ctrl+S)\r
dc.hotkey('ctrl', 's')\r
\r
# Select All (Ctrl+A)\r
dc.hotkey('ctrl', 'a')\r
```\r
\r
#### `key_down(key)` / `key_up(key)`\r
Manually control key state.\r
\r
**Example:**\r
```python\r
# Hold Shift\r
dc.key_down('shift')\r
dc.type_text("hello")  # Types "HELLO"\r
dc.key_up('shift')\r
\r
# Hold Ctrl and click (for multi-select)\r
dc.key_down('ctrl')\r
dc.click(100, 100)\r
dc.click(200, 100)\r
dc.key_up('ctrl')\r
```\r
\r
---\r
\r
### Screen Functions\r
\r
#### `screenshot(region=None, filename=None)`\r
Capture screen or region.\r
\r
**Parameters:**\r
- `region` (tuple, optional): (left, top, width, height) for partial capture\r
- `filename` (str, optional): Path to save image\r
\r
**Returns:** PIL Image object\r
\r
**Example:**\r
```python\r
# Full screen\r
img = dc.screenshot()\r
\r
# Save to file\r
dc.screenshot(filename="screenshot.png")\r
\r
# Capture specific region\r
img = dc.screenshot(region=(100, 100, 500, 300))\r
```\r
\r
#### `get_pixel_color(x, y)`\r
Get color of pixel at coordinates.\r
\r
**Returns:** RGB tuple `(r, g, b)`\r
\r
**Example:**\r
```python\r
r, g, b = dc.get_pixel_color(500, 300)\r
print(f"Color at (500, 300): RGB({r}, {g}, {b})")\r
```\r
\r
#### `find_on_screen(image_path, confidence=0.8)`\r
Find image on screen (requires OpenCV).\r
\r
**Parameters:**\r
- `image_path` (str): Path to template image\r
- `confidence` (float): Match threshold (0-1)\r
\r
**Returns:** `(x, y, width, height)` or None\r
\r
**Example:**\r
```python\r
# Find button on screen\r
location = dc.find_on_screen("button.png")\r
if location:\r
    x, y, w, h = location\r
    # Click center of found image\r
    dc.click(x + w//2, y + h//2)\r
```\r
\r
#### `get_screen_size()`\r
Get screen resolution.\r
\r
**Returns:** `(width, height)` tuple\r
\r
**Example:**\r
```python\r
width, height = dc.get_screen_size()\r
print(f"Screen: {width}x{height}")\r
```\r
\r
---\r
\r
### Window Functions\r
\r
#### `get_all_windows()`\r
List all open windows.\r
\r
**Returns:** List of window titles\r
\r
**Example:**\r
```python\r
windows = dc.get_all_windows()\r
for title in windows:\r
    print(f"Window: {title}")\r
```\r
\r
#### `activate_window(title_substring)`\r
Bring window to front by title.\r
\r
**Parameters:**\r
- `title_substring` (str): Part of window title to match\r
\r
**Example:**\r
```python\r
# Activate Chrome\r
dc.activate_window("Chrome")\r
\r
# Activate VS Code\r
dc.activate_window("Visual Studio Code")\r
```\r
\r
#### `get_active_window()`\r
Get currently focused window.\r
\r
**Returns:** Window title (str)\r
\r
**Example:**\r
```python\r
active = dc.get_active_window()\r
print(f"Active window: {active}")\r
```\r
\r
---\r
\r
### Clipboard Functions\r
\r
#### `copy_to_clipboard(text)`\r
Copy text to clipboard.\r
\r
**Example:**\r
```python\r
dc.copy_to_clipboard("Hello from SkillBoss!")\r
```\r
\r
#### `get_from_clipboard()`\r
Get text from clipboard.\r
\r
**Returns:** str\r
\r
**Example:**\r
```python\r
text = dc.get_from_clipboard()\r
print(f"Clipboard: {text}")\r
```\r
\r
---\r
\r
## ⌨️ Key Names Reference\r
\r
### Alphabet Keys\r
`'a'` through `'z'`\r
\r
### Number Keys\r
`'0'` through `'9'`\r
\r
### Function Keys\r
`'f1'` through `'f24'`\r
\r
### Special Keys\r
- `'enter'` / `'return'`\r
- `'esc'` / `'escape'`\r
- `'space'` / `'spacebar'`\r
- `'tab'`\r
- `'backspace'`\r
- `'delete'` / `'del'`\r
- `'insert'`\r
- `'home'`\r
- `'end'`\r
- `'pageup'` / `'pgup'`\r
- `'pagedown'` / `'pgdn'`\r
\r
### Arrow Keys\r
- `'up'` / `'down'` / `'left'` / `'right'`\r
\r
### Modifier Keys\r
- `'ctrl'` / `'control'`\r
- `'shift'`\r
- `'alt'`\r
- `'win'` / `'winleft'` / `'winright'`\r
- `'cmd'` / `'command'` (Mac)\r
\r
### Lock Keys\r
- `'capslock'`\r
- `'numlock'`\r
- `'scrolllock'`\r
\r
### Punctuation\r
- `'.'` / `','` / `'?'` / `'!'` / `';'` / `':'`\r
- `'['` / `']'` / `'{'` / `'}'`\r
- `'('` / `')'`\r
- `'+'` / `'-'` / `'*'` / `'/'` / `'='`\r
\r
---\r
\r
## 🛡️ Safety Features\r
\r
### Failsafe Mode\r
\r
Move mouse to **any corner** of the screen to abort all automation.\r
\r
```python\r
# Enable failsafe (enabled by default)\r
dc = DesktopController(failsafe=True)\r
```\r
\r
### Pause Control\r
\r
```python\r
# Pause all automation for 2 seconds\r
dc.pause(2.0)\r
\r
# Check if automation is safe to proceed\r
if dc.is_safe():\r
    dc.click(500, 500)\r
```\r
\r
### Approval Mode\r
\r
Require user confirmation before actions:\r
\r
```python\r
dc = DesktopController(require_approval=True)\r
\r
# This will ask for confirmation\r
dc.click(500, 500)  # Prompt: "Allow click at (500, 500)? [y/n]"\r
```\r
\r
---\r
\r
## 🎨 Advanced Examples\r
\r
### Example 1: Automated Form Filling\r
\r
```python\r
dc = DesktopController()\r
\r
# Click name field\r
dc.click(300, 200)\r
dc.type_text("John Doe", wpm=80)\r
\r
# Tab to next field\r
dc.press('tab')\r
dc.type_text("[email protected]", wpm=80)\r
\r
# Tab to password\r
dc.press('tab')\r
dc.type_text("SecurePassword123", wpm=60)\r
\r
# Submit form\r
dc.press('enter')\r
```\r
\r
### Example 2: Screenshot Region and Save\r
\r
```python\r
# Capture specific area\r
region = (100, 100, 800, 600)  # left, top, width, height\r
img = dc.screenshot(region=region)\r
\r
# Save with timestamp\r
import datetime\r
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")\r
img.save(f"capture_{timestamp}.png")\r
```\r
\r
### Example 3: Multi-File Selection\r
\r
```python\r
# Hold Ctrl and click multiple files\r
dc.key_down('ctrl')\r
dc.click(100, 200)  # First file\r
dc.click(100, 250)  # Second file\r
dc.click(100, 300)  # Third file\r
dc.key_up('ctrl')\r
\r
# Copy selected files\r
dc.hotkey('ctrl', 'c')\r
```\r
\r
### Example 4: Window Automation\r
\r
```python\r
# Activate Calculator\r
dc.activate_window("Calculator")\r
time.sleep(0.5)\r
\r
# Type calculation\r
dc.type_text("5+3=", interval=0.2)\r
time.sleep(0.5)\r
\r
# Take screenshot of result\r
dc.screenshot(filename="calculation_result.png")\r
```\r
\r
### Example 5: Drag & Drop File\r
\r
```python\r
# Drag file from source to destination\r
dc.drag(\r
    start_x=200, start_y=300,  # File location\r
    end_x=800, end_y=500,       # Folder location\r
    duration=1.0                 # Smooth 1-second drag\r
)\r
```\r
\r
---\r
\r
## ⚡ Performance Tips\r
\r
1. **Use instant movements** for speed: `duration=0`\r
2. **Batch operations** instead of individual calls\r
3. **Cache screen positions** instead of recalculating\r
4. **Disable failsafe** for maximum performance (use with caution)\r
5. **Use hotkeys** instead of menu navigation\r
\r
---\r
\r
## ⚠️ Important Notes\r
\r
- **Screen coordinates** start at (0, 0) in top-left corner\r
- **Multi-monitor setups** may have negative coordinates for secondary displays\r
- **Windows DPI scaling** may affect coordinate accuracy\r
- **Failsafe corners** are: (0,0), (width-1, 0), (0, height-1), (width-1, height-1)\r
- **Some applications** may block simulated input (games, secure apps)\r
\r
---\r
\r
## 🔧 Troubleshooting\r
\r
### Mouse not moving to correct position\r
- Check DPI scaling settings\r
- Verify screen resolution matches expectations\r
- Use `get_screen_size()` to confirm dimensions\r
\r
### Keyboard input not working\r
- Ensure target application has focus\r
- Some apps require admin privileges\r
- Try increasing `interval` for reliability\r
\r
### Failsafe triggering accidentally\r
- Increase screen border tolerance\r
- Move mouse away from corners during normal use\r
- Disable if needed: `DesktopController(failsafe=False)`\r
\r
### Permission errors\r
- Run Python with administrator privileges for some operations\r
- Some secure applications block automation\r
\r
---\r
\r
## 📦 Dependencies\r
\r
- **PyAutoGUI** - Core automation engine\r
- **Pillow** - Image processing\r
- **OpenCV** (optional) - Image recognition\r
- **PyGetWindow** - Window management\r
\r
Install all:\r
```bash\r
pip install pyautogui pillow opencv-python pygetwindow\r
```\r
\r
---\r
\r
**Built for SkillBoss** - The ultimate desktop automation companion\r
Usage Guidance
This skill appears to implement legitimate desktop automation (mouse, keyboard, screenshots, window/clipboard access). Before installing: 1) Review the Python source yourself (ai_agent.py and __init__.py are included) — they will run locally and can control your desktop and capture screenshots/clipboard. 2) Do not provide any API keys (e.g., SKILLBOSS_API_KEY) unless you fully trust the skill author; the docs show optional network calls that would transmit screenshots/task text to an external LLM. 3) Enable failsafe and consider using require_approval=True so actions need confirmation. 4) Test in an isolated environment (VM) and close sensitive windows/apps while testing. 5) If you plan to enable remote LLM integration, explicitly verify where data is sent, what gets uploaded, and which environment variables are required — ask the author to add required env vars to the registry metadata and to document any telemetry/exfiltration behaviors. These steps will reduce the chance of accidental data exposure.
Capability Analysis
Type: OpenClaw Skill Name: abe-desktop-control Version: 1.0.0 The skill bundle provides extensive desktop automation capabilities, including full mouse and keyboard control, screen capturing, window management, and clipboard access via libraries like pyautogui and pygetwindow. While these features are aligned with the stated purpose and include safety mechanisms such as a failsafe (moving the mouse to a corner) and an optional approval mode, the broad level of control over the host system represents a significant security risk in an agentic context. No evidence of intentional malice, such as hardcoded data exfiltration or backdoors, was found in the code (e.g., __init__.py, ai_agent.py).
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
Name/description match the shipped code: DesktopController implements mouse, keyboard, screen capture, window and clipboard operations and an AI agent layer for planning. These capabilities are coherent with a 'desktop-control' skill. Minor inconsistency: documentation (AI_AGENT_GUIDE.md and commented code in ai_agent.py) shows optional integration with a remote LLM via SKILLBOSS_API_KEY, but the registry 'requires.env' lists no credentials — the network/LLM integration is optional in code, but it's documented and could be enabled by a user, which is not reflected in declared requirements.
Instruction Scope
SKILL.md and demo files instruct the agent to capture screenshots, access clipboard contents, list windows, type text and control the mouse/keyboard — all expected for this skill. However the AI guide and commented code show examples that would send task data and screenshots to an external LLM endpoint (using a SKILLBOSS_API_KEY). Those network calls are not active in the main code path as shipped, but the presence of examples/comments that base64-encode or post screen content creates a potential exfiltration vector if an operator or future change enables remote reasoning. The runtime instructions do not explicitly restrict where captured data may be sent.
Install Mechanism
No install spec (no external downloads). The package is provided as Python source files and documents pip dependencies (pyautogui, pillow, opencv-python, pygetwindow, pyperclip). No high-risk install URLs or archive extract steps are present.
Credentials
The skill declares no required environment variables, which is reasonable for local desktop control. The docs and commented snippets, however, demonstrate optional use of SKILLBOSS_API_KEY to call an external LLM (api.heybossai.com). That environment variable is not listed in requires.env — this omission is a mismatch the user should be aware of before supplying any API keys. Also, the skill will read sensitive local data (screenshots, clipboard) as part of normal operation; that access is proportionate to its purpose but sensitive in nature.
Persistence & Privilege
The skill does not request 'always: true' and does not attempt to modify other skills or system-wide settings. It can be invoked autonomously by the agent (disable-model-invocation=false), which is the platform default. Autonomous invocation combined with full desktop control increases risk if malicious; however this is an expected capability for an AI-driven desktop automation skill and not in itself a sign of wrongdoing.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install abe-desktop-control
  3. After installation, invoke the skill by name or use /abe-desktop-control
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
abe-desktop-control v1.0.0 - Initial release of advanced desktop automation for SkillBoss. - Provides precise mouse control (movement, clicks, dragging, scrolling). - Full keyboard automation, including shortcuts, special keys, and typing speed options. - Screen capture, image recognition (with OpenCV), pixel color detection, and multi-monitor support. - Window management: list/activate windows, manage size/state. - Safety features: failsafe aborts, emergency stop, action approval, bounds checking, and logging.
Metadata
Slug abe-desktop-control
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is desktop-control?

Advanced desktop automation with mouse, keyboard, and screen control. It is an AI Agent Skill for Claude Code / OpenClaw, with 79 downloads so far.

How do I install desktop-control?

Run "/install abe-desktop-control" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is desktop-control free?

Yes, desktop-control is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does desktop-control support?

desktop-control is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created desktop-control?

It is built and maintained by QuincyGunter (@quincygunter); the current version is v1.0.0.

💬 Comments