功能描述

Automate comprehensive desktop tasks on Windows/macOS/Linux with safe, logged mouse, keyboard, OCR, image recognition, macro recording, and replay features.

使用说明 (SKILL.md)

Desktop Automation Skill v2.0

Name: Desktop automation ultra
Author: jordaneparis

Complete desktop automation for Windows/macOS/Linux. Zero-error edition.

⚠️ Privacy & Security

CRITICAL: This skill captures ALL keyboard and mouse events.

NEVER record while entering passwords, credit cards, or secrets
Recorded macros are stored as JSON in recorded_macro/ directory
Always use dry_run=true to test before actual execution
Store macros in secure locations only
Enable safe mode by default (it is)

🎯 What It Does

Automate desktop interactions without APIs:

✅ Click, type, drag, scroll
✅ Capture screenshots
✅ Recognize images (OpenCV template matching)
✅ Extract text (Tesseract OCR)
✅ Record and replay macros
✅ Find windows by title
✅ Clipboard operations
✅ Safe mode with dry_run for testing

🔐 Safety Features (Built-In)

1. Safe Mode (Default: ON)

Blocks dangerous actions when enabled:

type, press_key, click, drag are monitored
Parameters are scanned for dangerous patterns: rm , del , C:\Windows\, /etc/, sudo, etc.
Blocked actions are logged

2. Dry-Run Mode

All actions support dry_run=true:

Action is logged but NOT executed
Use for testing before running real automation

3. Audit Logging

Every action logged to ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

4. Thread Safety

All modules use locks to prevent race conditions.

📦 Installation

1. Extract Files

Place desktop-automation-ultra-local/ in:

Windows: C:\Users\\x3CUser>\.openclaw\workspace\skills\
Linux/macOS: ~/.openclaw/workspace/skills/

2. Install Dependencies

pip install -r requirements.txt

3. Optional: Tesseract for OCR

For find_text_on_screen functionality:

Windows: Download installer from https://github.com/UB-Mannheim/tesseract/wiki
Linux: sudo apt install tesseract-ocr
macOS: brew install tesseract

4. Restart OpenClaw

openclaw gateway restart

🚀 Quick Start

Basic Click

action: click
params:
  x: 100
  y: 100
  dry_run: true  # Test first!

Type Text

action: type
params:
  text: "Hello World"
  interval: 0.05  # Delay between keys
  dry_run: false

Find Image

action: find_image
params:
  template_path: "templates/button.png"
  confidence: 0.95

Extract Text (OCR)

action: read_text_ocr
params:
  lang: "fra"  # French

📖 Core Actions

Mouse & Keyboard

Action	Parameters	Returns
`click`	`x`, `y`, `button="left"`, `dry_run`	`{status, x, y}`
`type`	`text`, `interval=0.05`, `dry_run`	`{status, text}`
`press_key`	`key`, `dry_run`	`{status, key}`
`move_mouse`	`x`, `y`, `duration=0.5`, `dry_run`	`{status, x, y}`
`scroll`	`amount=5`, `dry_run`	`{status, amount}`
`drag`	`start_x`, `start_y`, `end_x`, `end_y`, `duration=0.5`, `dry_run`	`{status}`
`copy_to_clipboard`	`text`, `dry_run`	`{status}`
`paste_from_clipboard`	`dry_run`	`{status, length}`

Screenshots & Windows

Action	Parameters	Returns
`screenshot`	`path="~/Desktop/screenshot.png"`, `dry_run`	`{status, path}`
`get_active_window`	`dry_run`	`{status, title, x, y, width, height}`
`list_windows`	`dry_run`	`{status, windows[], count}`
`activate_window`	`title_substring`, `dry_run`	`{status, title}`

Image Recognition (requires OpenCV)

Action	Parameters	Returns
`find_image`	`template_path`, `confidence=0.9`, `dry_run`	`{status, x, y, confidence}`
`find_image_multiscale`	`template_path`, `confidence`, `scale_factors`, `dry_run`	`{status, x, y, confidence, scale}`
`wait_for_image`	`template_path`, `timeout=30.0`, `interval=0.5`, `confidence=0.9`, `dry_run`	`{status, x, y, confidence}`

OCR / Text Recognition (requires Tesseract)

Action	Parameters	Returns
`find_text_on_screen`	`text`, `lang="fra"`, `dry_run`	`{status, locations[], count}`
`find_all_text_on_screen`	`text`, `lang="fra"`, `dry_run`	`{status, data[], count}`
`read_text_ocr`	`lang="fra"`, `dry_run`	`{status, text, length}`
`read_text_region`	`x`, `y`, `width`, `height`, `lang="fra"`, `dry_run`	`{status, text, length}`
`extract_screen_data`	`region={}`, `output_format="json"`, `lang="fra"`, `dry_run`	`{status, data[], count}`

Macros

Action	Parameters	Returns
`play_macro`	`macro_path`, `speed=1.0`, `dry_run`	`{status, executed, total, errors[]}`
`stop_macro`	—	`{status}`
`play_macro_with_subroutines`	`macro_path`, `speed=1.0`, `sub_macros_dir`, `dry_run`	`{status, executed, total, errors[]}`

Safety Management

Action	Parameters	Returns
`set_safe_mode`	`enabled=true`	`{status, safe_mode}`
`get_safety_status`	—	`{status, safe_mode_enabled, dangerous_patterns, dangerous_actions[]}`

📝 Macro Format

Recorded macros are JSON with this structure:

{
  "events": [
    {
      "action": "click",
      "params": {"x": 100, "y": 50},
      "wait": 500
    },
    {
      "action": "type",
      "params": {"text": "Hello"},
      "wait": 200
    },
    {
      "action": "press_key",
      "params": {"key": "return"},
      "wait": 100
    }
  ]
}

action — action name
params — action parameters
wait — milliseconds to wait before next action

🔧 Advanced: Mouse Move Debouncing

To avoid recording hundreds of move_mouse events during a smooth drag, the recorder uses debouncing:

When you move the mouse, events are suppressed during movement
After you stop moving for N seconds (default: 1 sec), the final position is recorded
This reduces macro size dramatically while preserving intended end positions
Configurable via GUI: set debounce time (0.1–10 seconds)

Example:

Fast horizontal line → 1 move_mouse event (end coordinates)
Slow, stop-and-go → multiple move_mouse events (one per "stop")

🧪 Testing

Run the unit test suite:

python scripts/test_automation.py

Output:

test_dry_run_click ... ok
test_get_active_window ... ok
test_safe_mode_blocks_dangerous ... ok
...
Ran 13 tests
OK

📊 Logging

All actions logged to: ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

Example:

[2026-03-15 10:23:45] [INFO] ActionManager: ActionManager initialized with safe_mode=True
[2026-03-15 10:23:46] [INFO] ActionManager: Clicked at (100, 50) with left button
[2026-03-15 10:23:47] [INFO] ActionManager: Typed: Hello World

⚙️ Configuration

Environment Variables

# Override log directory
export AUTOMATION_LOG_DIR=~/my_logs

# Disable safe mode globally (NOT recommended)
export AUTOMATION_SAFE_MODE=false

🐛 Troubleshooting

"pyautogui failsafe triggered"

Move mouse to corner of screen to stop.

OCR returns empty text

Ensure Tesseract is installed correctly
Check image quality (high contrast helps)
Try read_text_ocr instead of find_text_on_screen

Image recognition not finding template

Ensure template image exists and is correct format (PNG, JPG)
Try lower confidence threshold (e.g., 0.85 instead of 0.95)
Use find_image_multiscale to detect at different scales

Actions blocked by safe mode

This is intentional. To run dangerous actions:

action: set_safe_mode
params:
  enabled: false

Then execute your action. Re-enable safe mode immediately after:

action: set_safe_mode
params:
  enabled: true

📄 License

MIT License. See LICENSE file.

📚 Files Structure

desktop-automation-ultra-local/
├── SKILL.md                          (This file)
├── requirements.txt                  (Python dependencies)
├── lib/
│   ├── actions.py                   (Core click/type/drag actions)
│   ├── image_recognition.py         (OpenCV template matching)
│   ├── ocr_engine.py                (Tesseract OCR)
│   ├── macro_player.py              (Record/playback macros)
│   ├── safety_manager.py            (Safe mode, blocking)
│   └── utils.py                     (Logging, helpers)
├── scripts/
│   └── test_automation.py           (Unit tests)
└── recorded_macro/                  (Output: saved macros)

✅ Validation Checklist

All modules have proper error handling
Thread safety implemented (locks)
Safe mode enabled by default
Dry-run mode on all actions
Comprehensive logging
Unit tests (13 tests)
UTF-8 encoding for all text
No hardcoded paths (uses expanduser)
Graceful fallbacks for missing dependencies
Documentation complete

Status: PRODUCTION READY ✅

Last updated: 2026-03-15 Version: 2.0.0

安全使用建议

This skill appears to do what it says: local desktop automation and macro recording. Important things to consider before installing: - Privacy: the recorder captures ALL keyboard and mouse events (including passwords and sensitive text) and stores macros as JSON — never record while entering secrets and store macro files securely. - Metadata note: the registry metadata did not declare required binaries, but the skill requires Python in PATH and optional system packages (Tesseract, xclip/xsel) and Python packages from requirements.txt; ensure you have the appropriate runtime and review dependencies before installing. - Autonomy: the agent can invoke the skill autonomously by default. If you do not want automated UI actions to run without manual approval, restrict the skill's permissions or require manual invocation. - Local-only: the code and docs claim no network access; still inspect the shipped files yourself (they are included) for unexpected network calls before trusting them on a sensitive machine. - Recommended: run the included tests (scripts/test_automation.py) in a safe environment, use dry_run=true for initial testing, and review/rotate any stored macros (or encrypt them) if they may contain sensitive data.

功能分析

Type: OpenClaw Skill Name: desktop-automation-ultra Version: 2.0.0 The 'desktop-automation-ultra' skill is a comprehensive toolkit for mouse, keyboard, and window control, incorporating OCR (Tesseract) and image recognition (OpenCV). It features a robust safety architecture including a 'Safe Mode' that blacklists dangerous command patterns (e.g., 'rm', 'sudo', 'del') and a 'Dry-Run' mode for non-destructive testing. While the skill possesses high-risk capabilities like full keystroke recording and screen scraping, these are clearly documented with explicit privacy warnings in SKILL.md and README.md, and the code lacks any evidence of obfuscation, data exfiltration, or unauthorized persistence.

能力评估

ℹ Purpose & Capability

The name/description (desktop automation, macro recording, OCR, image recognition) matches the shipped code and docs. One inconsistency: the registry metadata lists no required binaries, but the skill clearly expects a Python runtime (calls 'python' from skill.js, includes Python modules and a requirements.txt) and the README mentions system dependencies (Tesseract, xclip/xsel on Linux). This is a metadata omission but does not indicate hidden behavior.

✓ Instruction Scope

SKILL.md and the included scripts instruct only local actions (mouse/keyboard, screenshots, OCR, image matching, macro files, logs). The files and docs explicitly warn that the recorder captures ALL keyboard/mouse events and that macros are stored locally. There are no instructions to read unrelated system secrets or to send data to external endpoints.

ℹ Install Mechanism

No automated install spec is present (user must place the folder in the skills directory and run pip install -r requirements.txt). That is lower-risk than remote installers, but users should be aware the skill expects pip/OS packages and optional system binaries (Tesseract). The package list is standard for this functionality and all code is included locally — no suspicious external download URLs were provided.

✓ Credentials

The skill does not request environment variables or external credentials. Its use of cryptography (for password-protected macros) is reasonable for the documented feature. No unrelated secrets or cloud credentials are required.

✓ Persistence & Privilege

always:false and normal autonomous invocation are used. The skill writes logs and macro files under user/home paths (e.g., recorded_macro/, ~/.openclaw/...), which is expected for this kind of tool. It does not attempt to modify other skills or system-wide agent settings in the provided code.

版本历史

v2.0.0

**Big release: Desktop Automation Ultra v2.0.0 adds advanced safety controls, new macro handling, and expanded capabilities.** - Added built-in Safe Mode (default ON) to block dangerous actions, with pattern detection for system-critical commands. - Introduced dry_run and audit logging for all actions, enhancing testing and traceability. - Expanded core actions: improved image recognition, flexible window control, full-featured clipboard and OCR/text extraction. - New macro system: records full action/event logs in JSON, with configurable mouse move debouncing. - Improved installation/testing steps, troubleshooting tips, and advanced configuration via environment variables.

元数据

Slug desktop-automation-ultra

版本 2.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Desktop automation ultra 是什么？

Automate comprehensive desktop tasks on Windows/macOS/Linux with safe, logged mouse, keyboard, OCR, image recognition, macro recording, and replay features. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 513 次。

如何安装 Desktop automation ultra？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install desktop-automation-ultra」即可一键安装，无需额外配置。

Desktop automation ultra 是免费的吗？

是的，Desktop automation ultra 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Desktop automation ultra 支持哪些平台？

Desktop automation ultra 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Desktop automation ultra？

由 JordaneParis（@jordaneparis）开发并维护，当前版本 v2.0.0。

Desktop automation ultra