← 返回 Skills 市场
jordaneparis

Desktop automation ultra

作者 JordaneParis · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
513
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install desktop-automation-ultra
功能描述
Automate comprehensive desktop tasks on Windows/macOS/Linux with safe, logged mouse, keyboard, OCR, image recognition, macro recording, and replay features.
使用说明 (SKILL.md)

Desktop Automation Skill v2.0

License: MIT OpenClaw

Complete desktop automation for Windows/macOS/Linux. Zero-error edition.


⚠️ Privacy & Security

CRITICAL: This skill captures ALL keyboard and mouse events.

  • NEVER record while entering passwords, credit cards, or secrets
  • Recorded macros are stored as JSON in recorded_macro/ directory
  • Always use dry_run=true to test before actual execution
  • Store macros in secure locations only
  • Enable safe mode by default (it is)

🎯 What It Does

Automate desktop interactions without APIs:

  • ✅ Click, type, drag, scroll
  • ✅ Capture screenshots
  • ✅ Recognize images (OpenCV template matching)
  • ✅ Extract text (Tesseract OCR)
  • ✅ Record and replay macros
  • ✅ Find windows by title
  • ✅ Clipboard operations
  • ✅ Safe mode with dry_run for testing

🔐 Safety Features (Built-In)

1. Safe Mode (Default: ON)

Blocks dangerous actions when enabled:

  • type, press_key, click, drag are monitored
  • Parameters are scanned for dangerous patterns: rm , del , C:\Windows\, /etc/, sudo, etc.
  • Blocked actions are logged

2. Dry-Run Mode

All actions support dry_run=true:

  • Action is logged but NOT executed
  • Use for testing before running real automation

3. Audit Logging

Every action logged to ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

4. Thread Safety

All modules use locks to prevent race conditions.


📦 Installation

1. Extract Files

Place desktop-automation-ultra-local/ in:

  • Windows: C:\Users\\x3CUser>\.openclaw\workspace\skills\
  • Linux/macOS: ~/.openclaw/workspace/skills/

2. Install Dependencies

pip install -r requirements.txt

3. Optional: Tesseract for OCR

For find_text_on_screen functionality:

4. Restart OpenClaw

openclaw gateway restart

🚀 Quick Start

Basic Click

action: click
params:
  x: 100
  y: 100
  dry_run: true  # Test first!

Type Text

action: type
params:
  text: "Hello World"
  interval: 0.05  # Delay between keys
  dry_run: false

Find Image

action: find_image
params:
  template_path: "templates/button.png"
  confidence: 0.95

Extract Text (OCR)

action: read_text_ocr
params:
  lang: "fra"  # French

📖 Core Actions

Mouse & Keyboard

Action Parameters Returns
click x, y, button="left", dry_run {status, x, y}
type text, interval=0.05, dry_run {status, text}
press_key key, dry_run {status, key}
move_mouse x, y, duration=0.5, dry_run {status, x, y}
scroll amount=5, dry_run {status, amount}
drag start_x, start_y, end_x, end_y, duration=0.5, dry_run {status}
copy_to_clipboard text, dry_run {status}
paste_from_clipboard dry_run {status, length}

Screenshots & Windows

Action Parameters Returns
screenshot path="~/Desktop/screenshot.png", dry_run {status, path}
get_active_window dry_run {status, title, x, y, width, height}
list_windows dry_run {status, windows[], count}
activate_window title_substring, dry_run {status, title}

Image Recognition (requires OpenCV)

Action Parameters Returns
find_image template_path, confidence=0.9, dry_run {status, x, y, confidence}
find_image_multiscale template_path, confidence, scale_factors, dry_run {status, x, y, confidence, scale}
wait_for_image template_path, timeout=30.0, interval=0.5, confidence=0.9, dry_run {status, x, y, confidence}

OCR / Text Recognition (requires Tesseract)

Action Parameters Returns
find_text_on_screen text, lang="fra", dry_run {status, locations[], count}
find_all_text_on_screen text, lang="fra", dry_run {status, data[], count}
read_text_ocr lang="fra", dry_run {status, text, length}
read_text_region x, y, width, height, lang="fra", dry_run {status, text, length}
extract_screen_data region={}, output_format="json", lang="fra", dry_run {status, data[], count}

Macros

Action Parameters Returns
play_macro macro_path, speed=1.0, dry_run {status, executed, total, errors[]}
stop_macro {status}
play_macro_with_subroutines macro_path, speed=1.0, sub_macros_dir, dry_run {status, executed, total, errors[]}

Safety Management

Action Parameters Returns
set_safe_mode enabled=true {status, safe_mode}
get_safety_status {status, safe_mode_enabled, dangerous_patterns, dangerous_actions[]}

📝 Macro Format

Recorded macros are JSON with this structure:

{
  "events": [
    {
      "action": "click",
      "params": {"x": 100, "y": 50},
      "wait": 500
    },
    {
      "action": "type",
      "params": {"text": "Hello"},
      "wait": 200
    },
    {
      "action": "press_key",
      "params": {"key": "return"},
      "wait": 100
    }
  ]
}
  • action — action name
  • params — action parameters
  • wait — milliseconds to wait before next action

🔧 Advanced: Mouse Move Debouncing

To avoid recording hundreds of move_mouse events during a smooth drag, the recorder uses debouncing:

  • When you move the mouse, events are suppressed during movement
  • After you stop moving for N seconds (default: 1 sec), the final position is recorded
  • This reduces macro size dramatically while preserving intended end positions
  • Configurable via GUI: set debounce time (0.1–10 seconds)

Example:

  • Fast horizontal line → 1 move_mouse event (end coordinates)
  • Slow, stop-and-go → multiple move_mouse events (one per "stop")

🧪 Testing

Run the unit test suite:

python scripts/test_automation.py

Output:

test_dry_run_click ... ok
test_get_active_window ... ok
test_safe_mode_blocks_dangerous ... ok
...
Ran 13 tests
OK

📊 Logging

All actions logged to: ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

Example:

[2026-03-15 10:23:45] [INFO] ActionManager: ActionManager initialized with safe_mode=True
[2026-03-15 10:23:46] [INFO] ActionManager: Clicked at (100, 50) with left button
[2026-03-15 10:23:47] [INFO] ActionManager: Typed: Hello World

⚙️ Configuration

Environment Variables

# Override log directory
export AUTOMATION_LOG_DIR=~/my_logs

# Disable safe mode globally (NOT recommended)
export AUTOMATION_SAFE_MODE=false

🐛 Troubleshooting

"pyautogui failsafe triggered"

Move mouse to corner of screen to stop.

OCR returns empty text

  • Ensure Tesseract is installed correctly
  • Check image quality (high contrast helps)
  • Try read_text_ocr instead of find_text_on_screen

Image recognition not finding template

  • Ensure template image exists and is correct format (PNG, JPG)
  • Try lower confidence threshold (e.g., 0.85 instead of 0.95)
  • Use find_image_multiscale to detect at different scales

Actions blocked by safe mode

This is intentional. To run dangerous actions:

action: set_safe_mode
params:
  enabled: false

Then execute your action. Re-enable safe mode immediately after:

action: set_safe_mode
params:
  enabled: true

📄 License

MIT License. See LICENSE file.


📚 Files Structure

desktop-automation-ultra-local/
├── SKILL.md                          (This file)
├── requirements.txt                  (Python dependencies)
├── lib/
│   ├── actions.py                   (Core click/type/drag actions)
│   ├── image_recognition.py         (OpenCV template matching)
│   ├── ocr_engine.py                (Tesseract OCR)
│   ├── macro_player.py              (Record/playback macros)
│   ├── safety_manager.py            (Safe mode, blocking)
│   └── utils.py                     (Logging, helpers)
├── scripts/
│   └── test_automation.py           (Unit tests)
└── recorded_macro/                  (Output: saved macros)

Validation Checklist

  • All modules have proper error handling
  • Thread safety implemented (locks)
  • Safe mode enabled by default
  • Dry-run mode on all actions
  • Comprehensive logging
  • Unit tests (13 tests)
  • UTF-8 encoding for all text
  • No hardcoded paths (uses expanduser)
  • Graceful fallbacks for missing dependencies
  • Documentation complete

Status: PRODUCTION READY


Last updated: 2026-03-15 Version: 2.0.0

安全使用建议
This skill appears to do what it says: local desktop automation and macro recording. Important things to consider before installing: - Privacy: the recorder captures ALL keyboard and mouse events (including passwords and sensitive text) and stores macros as JSON — never record while entering secrets and store macro files securely. - Metadata note: the registry metadata did not declare required binaries, but the skill requires Python in PATH and optional system packages (Tesseract, xclip/xsel) and Python packages from requirements.txt; ensure you have the appropriate runtime and review dependencies before installing. - Autonomy: the agent can invoke the skill autonomously by default. If you do not want automated UI actions to run without manual approval, restrict the skill's permissions or require manual invocation. - Local-only: the code and docs claim no network access; still inspect the shipped files yourself (they are included) for unexpected network calls before trusting them on a sensitive machine. - Recommended: run the included tests (scripts/test_automation.py) in a safe environment, use dry_run=true for initial testing, and review/rotate any stored macros (or encrypt them) if they may contain sensitive data.
功能分析
Type: OpenClaw Skill Name: desktop-automation-ultra Version: 2.0.0 The 'desktop-automation-ultra' skill is a comprehensive toolkit for mouse, keyboard, and window control, incorporating OCR (Tesseract) and image recognition (OpenCV). It features a robust safety architecture including a 'Safe Mode' that blacklists dangerous command patterns (e.g., 'rm', 'sudo', 'del') and a 'Dry-Run' mode for non-destructive testing. While the skill possesses high-risk capabilities like full keystroke recording and screen scraping, these are clearly documented with explicit privacy warnings in SKILL.md and README.md, and the code lacks any evidence of obfuscation, data exfiltration, or unauthorized persistence.
能力评估
Purpose & Capability
The name/description (desktop automation, macro recording, OCR, image recognition) matches the shipped code and docs. One inconsistency: the registry metadata lists no required binaries, but the skill clearly expects a Python runtime (calls 'python' from skill.js, includes Python modules and a requirements.txt) and the README mentions system dependencies (Tesseract, xclip/xsel on Linux). This is a metadata omission but does not indicate hidden behavior.
Instruction Scope
SKILL.md and the included scripts instruct only local actions (mouse/keyboard, screenshots, OCR, image matching, macro files, logs). The files and docs explicitly warn that the recorder captures ALL keyboard/mouse events and that macros are stored locally. There are no instructions to read unrelated system secrets or to send data to external endpoints.
Install Mechanism
No automated install spec is present (user must place the folder in the skills directory and run pip install -r requirements.txt). That is lower-risk than remote installers, but users should be aware the skill expects pip/OS packages and optional system binaries (Tesseract). The package list is standard for this functionality and all code is included locally — no suspicious external download URLs were provided.
Credentials
The skill does not request environment variables or external credentials. Its use of cryptography (for password-protected macros) is reasonable for the documented feature. No unrelated secrets or cloud credentials are required.
Persistence & Privilege
always:false and normal autonomous invocation are used. The skill writes logs and macro files under user/home paths (e.g., recorded_macro/, ~/.openclaw/...), which is expected for this kind of tool. It does not attempt to modify other skills or system-wide agent settings in the provided code.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install desktop-automation-ultra
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /desktop-automation-ultra 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.0
**Big release: Desktop Automation Ultra v2.0.0 adds advanced safety controls, new macro handling, and expanded capabilities.** - Added built-in Safe Mode (default ON) to block dangerous actions, with pattern detection for system-critical commands. - Introduced dry_run and audit logging for all actions, enhancing testing and traceability. - Expanded core actions: improved image recognition, flexible window control, full-featured clipboard and OCR/text extraction. - New macro system: records full action/event logs in JSON, with configurable mouse move debouncing. - Improved installation/testing steps, troubleshooting tips, and advanced configuration via environment variables.
元数据
Slug desktop-automation-ultra
版本 2.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Desktop automation ultra 是什么?

Automate comprehensive desktop tasks on Windows/macOS/Linux with safe, logged mouse, keyboard, OCR, image recognition, macro recording, and replay features. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 513 次。

如何安装 Desktop automation ultra?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install desktop-automation-ultra」即可一键安装,无需额外配置。

Desktop automation ultra 是免费的吗?

是的,Desktop automation ultra 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Desktop automation ultra 支持哪些平台?

Desktop automation ultra 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Desktop automation ultra?

由 JordaneParis(@jordaneparis)开发并维护,当前版本 v2.0.0。

💬 留言讨论