功能描述

Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop a...

使用说明 (SKILL.md)

agent-desktop

Name: my skill
Author: darryek

CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees.

Core principle: agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent.

Installation

npm install -g agent-desktop
# or
bun install -g --trust agent-desktop

Requires macOS 12+ with Accessibility permission granted to your terminal.

Reference Files

Detailed documentation is split into focused reference files. Read them as needed:

Reference	Contents
`references/commands-observation.md`	snapshot, find, get, is, screenshot, list-surfaces — all flags, output examples
`references/commands-interaction.md`	click, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command
`references/commands-system.md`	launch, close, windows, clipboard, wait, batch, status, permissions, version
`references/workflows.md`	12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns
`references/macos.md`	macOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting

The Observe-Act Loop

Every automation follows this pattern:

1. OBSERVE  → agent-desktop snapshot --app "App Name" -i
2. REASON   → Parse JSON, find target element by ref (@e1, @e2...)
3. ACT      → agent-desktop click @e5  (or type, select, toggle...)
4. VERIFY   → agent-desktop snapshot again to confirm state change
5. REPEAT   → Continue until task is complete

Always snapshot before acting. Refs are snapshot-scoped and become stale after UI changes.

Ref System

Refs assigned depth-first: @e1, @e2, @e3...
Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell
Static text, groups, containers remain in tree for context but have no ref
Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed
After any action that changes UI, run snapshot again for fresh refs

JSON Output Contract

Every command returns a JSON envelope on stdout:

Success: { "version": "1.0", "ok": true, "command": "snapshot", "data": { ... } } Error: { "version": "1.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }

Exit codes: 0 success, 1 structured error, 2 argument error.

Error Codes

Code	Meaning	Recovery
`PERM_DENIED`	Accessibility permission not granted	Grant in System Settings > Privacy > Accessibility
`ELEMENT_NOT_FOUND`	Ref not in current refmap	Re-run snapshot, use fresh ref
`APP_NOT_FOUND`	App not running	Launch it first
`ACTION_FAILED`	AX action rejected	Try alternative approach or coordinate-based click
`ACTION_NOT_SUPPORTED`	Element can't do this	Use different command
`STALE_REF`	Ref from old snapshot	Re-run snapshot
`WINDOW_NOT_FOUND`	No matching window	Check app name, use list-windows
`TIMEOUT`	Wait condition not met	Increase --timeout
`INVALID_ARGS`	Bad arguments	Check command syntax

Command Quick Reference (54 commands)

Observation

agent-desktop snapshot --app "App" -i           # Accessibility tree with refs
agent-desktop screenshot --app "App" out.png    # PNG screenshot
agent-desktop find --app "App" --role button    # Search elements
agent-desktop get @e1 --property text           # Read element property
agent-desktop is @e1 --property enabled         # Check element state
agent-desktop list-surfaces --app "App"         # Available surfaces

Interaction

agent-desktop click @e5                         # Click element
agent-desktop double-click @e3                  # Double-click
agent-desktop triple-click @e2                  # Triple-click (select line)
agent-desktop right-click @e5                   # Right-click (context menu)
agent-desktop type @e2 "hello"                  # Type text into element
agent-desktop set-value @e2 "new value"         # Set value directly
agent-desktop clear @e2                         # Clear element value
agent-desktop focus @e2                         # Set keyboard focus
agent-desktop select @e4 "Option B"             # Select dropdown option
agent-desktop toggle @e6                        # Toggle checkbox/switch
agent-desktop check @e6                         # Idempotent check
agent-desktop uncheck @e6                       # Idempotent uncheck
agent-desktop expand @e7                        # Expand disclosure
agent-desktop collapse @e7                      # Collapse disclosure
agent-desktop scroll @e1 --direction down       # Scroll element
agent-desktop scroll-to @e8                     # Scroll into view

Keyboard & Mouse

agent-desktop press cmd+c                       # Key combo
agent-desktop press return --app "App"          # Targeted key press
agent-desktop key-down shift                    # Hold key
agent-desktop key-up shift                      # Release key
agent-desktop hover @e5                         # Cursor to element
agent-desktop hover --xy 500,300                # Cursor to coordinates
agent-desktop drag --from @e1 --to @e5          # Drag between elements
agent-desktop mouse-click --xy 500,300          # Click at coordinates
agent-desktop mouse-move --xy 100,200           # Move cursor
agent-desktop mouse-down --xy 100,200           # Press mouse button
agent-desktop mouse-up --xy 300,400             # Release mouse button

App & Window

agent-desktop launch "System Settings"          # Launch and wait
agent-desktop close-app "TextEdit"              # Quit gracefully
agent-desktop close-app "TextEdit" --force      # Force kill
agent-desktop list-windows --app "Finder"       # List windows
agent-desktop list-apps                         # List running GUI apps
agent-desktop focus-window --app "Finder"       # Bring to front
agent-desktop resize-window --app "App" --width 800 --height 600
agent-desktop move-window --app "App" --x 0 --y 0
agent-desktop minimize --app "App"
agent-desktop maximize --app "App"
agent-desktop restore --app "App"

Notifications

agent-desktop list-notifications                # List all notifications
agent-desktop list-notifications --app "Slack"  # Filter by app
agent-desktop list-notifications --text "deploy" --limit 5  # Filter by text
agent-desktop dismiss-notification 1            # Dismiss by index
agent-desktop dismiss-all-notifications         # Dismiss all
agent-desktop dismiss-all-notifications --app "Slack"  # Dismiss all from app
agent-desktop notification-action 1 --action "Reply"   # Click action button

Clipboard

agent-desktop clipboard-get                     # Read clipboard
agent-desktop clipboard-set "text"              # Write to clipboard
agent-desktop clipboard-clear                   # Clear clipboard

Wait

agent-desktop wait 1000                         # Pause 1 second
agent-desktop wait --element @e5 --timeout 5000 # Wait for element
agent-desktop wait --window "Title"             # Wait for window
agent-desktop wait --text "Done" --app "App"    # Wait for text
agent-desktop wait --menu --app "App"           # Wait for context menu
agent-desktop wait --menu-closed --app "App"    # Wait for menu dismissal
agent-desktop wait --notification --app "App"   # Wait for new notification

System

agent-desktop status                            # Health check
agent-desktop permissions                       # Check permission
agent-desktop permissions --request             # Trigger permission dialog
agent-desktop version --json                    # Version info
agent-desktop batch '[...]' --stop-on-error     # Batch commands

Key Principles for Agents

Always snapshot first. Never assume UI state.
Use -i flag. Filters to interactive elements only, reducing tokens.
Refs are ephemeral. Snapshot again after any UI-changing action.
Prefer refs over coordinates. click @e5 > mouse-click --xy 500,300.
Use wait for async UI. After launch/dialog triggers, wait for expected state.
Check permissions first. Run permissions on first use.
Handle errors. Parse error.code and follow error.suggestion.
Use find for targeted searches. Faster than full snapshot when you know role/name.
Use surfaces for menus. snapshot --surface menu captures open menus.
Batch for performance. Multiple commands in one invocation.

安全使用建议

This skill appears to be documentation for a desktop automation CLI (agent-desktop) and can fully observe and control UI elements on macOS — a very powerful capability. Before installing or granting Accessibility permission: 1) verify the exact npm package name and publisher on the npm registry (inspect its source code and maintainers), 2) confirm the skill bundle's metadata (slug/owner) matches the published package or author—the bundle shows mismatched IDs and names which could indicate repackaging, 3) only grant Accessibility permission to a terminal you trust (do not add unknown terminal apps), 4) consider testing in an isolated machine or VM since the tool can read clipboard, notifications, and application UIs, and 5) if you are uncomfortable with autonomous agents controlling your desktop, disable autonomous invocation or restrict the skill until you can audit the installed CLI. If you want, provide the npm package URL or package.json from the CLI so I can help check the publisher and code surface for you.

功能分析

Type: OpenClaw Skill Name: aoto Version: 1.0.0 The bundle provides a comprehensive suite of 54 commands for full desktop automation on macOS via the `agent-desktop` CLI. It includes high-risk capabilities such as capturing screenshots (`screenshot` in `references/commands-observation.md`), reading and writing to the system clipboard (`clipboard-get/set` in `references/commands-system.md`), and full UI inspection (`snapshot` in `references/commands-observation.md`), which could be used to access sensitive data. While these features are consistent with the stated goal of GUI automation and the documentation is thorough, the broad permissions and potential for abuse by an AI agent—without built-in safeguards or restricted scopes—warrant a suspicious classification under the provided criteria. No evidence of intentional malice or exfiltration was found.

能力评估

ℹ Purpose & Capability

SKILL.md clearly documents a desktop automation CLI (agent-desktop) that reads and manipulates macOS accessibility trees — this aligns with the described purpose. However the registry metadata (skill name 'my skill', slug 'aoto', owner IDs) does not match the tool identity in SKILL.md ('agent-desktop'), indicating packaging/branding inconsistency that should be resolved.

⚠ Instruction Scope

The runtime instructions tell the agent to snapshot UI trees, read element properties and clipboard, list/dismiss notifications, synthesize keyboard/mouse events, and perform coordinate clicks. Those actions are coherent for a desktop automation tool but are high-privilege: they let the agent read arbitrary on-screen content and control apps. The SKILL.md also instructs the user/agent to install the CLI and to grant Accessibility permission to the terminal — both expected but sensitive operations.

ℹ Install Mechanism

There is no formal install spec in the skill bundle, but SKILL.md instructs installing via 'npm install -g agent-desktop' or 'bun install -g --trust agent-desktop'. That is a reasonable, common install path, but the registry package metadata does not provide a homepage/source or verify the npm package name, so you should verify the npm package and its publisher before running the global install.

✓ Credentials

The skill declares no environment variables or credentials (appropriate). It does require granting macOS Accessibility permission to the terminal, which is necessary for the claimed functionality but also grants broad read/control over the desktop; this privilege is proportionate to the feature set but sensitive.

ℹ Persistence & Privilege

The skill is not set to always:true. It can be invoked autonomously (platform default), which combined with desktop-control capabilities increases risk. Autonomous invocation alone is normal, but you should be aware that an agent using this skill could autonomously perform UI actions and read screen/clipboard data.

版本历史

v1.0.0

Initial release of agent-desktop for desktop automation via native OS accessibility trees. - Provides CLI to observe and control desktop apps (click, type, fill forms, automate, etc.) through accessibility APIs. - Outputs structured JSON with ref-based element identifiers for robust automation workflows. - Supports 54 commands across observation, interaction, keyboard/mouse, app/window management, notifications, and clipboard. - Includes detailed reference documentation and error handling guidance. - Supported on macOS (Phase 1); Windows and Linux support planned.

元数据

Slug aoto

版本 1.0.0

许可证 —

累计安装 1

当前安装数 1

历史版本数 1

常见问题

my skill 是什么？

Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop a... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 351 次。

如何安装 my skill？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install aoto」即可一键安装，无需额外配置。

my skill 是免费的吗？

是的，my skill 完全免费（开源免费），可自由下载、安装和使用。

my skill 支持哪些平台？

my skill 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 my skill？

由迩康（@darryek）开发并维护，当前版本 v1.0.0。

my skill