← Back to Skills Marketplace
fuzzyb33s

Desktop Agent

by Fuzzyb33s · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
95
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install desktop-agent
Description
Control desktop apps via mouse and keyboard, capture screenshots, teach AI tasks by demonstration, and automate workflows with saved reusable tasks.
README (SKILL.md)

Desktop AI Agent Skill

Use this skill when the user wants to:

  • Control desktop applications
  • Teach the AI to perform tasks
  • Have the AI learn from demonstrations
  • Automate desktop workflows

Commands

Take Screenshot

from desktop_agent import get_agent
agent = get_agent()
agent.capture_to_file("screenshot.png")

Get Mouse Position

agent = get_agent()
pos = agent.get_mouse_position()  # Returns (x, y)

Mouse Control

agent = get_agent()
agent.move_to(x, y)           # Move mouse
agent.click(x, y)             # Click
agent.double_click(x, y)      # Double click  
agent.right_click(x, y)      # Right click
agent.drag_to(x, y)           # Drag

Keyboard Control

agent = get_agent()
agent.type("Hello world")     # Type text
agent.press("enter")           # Press key
agent.hotkey("ctrl", "c")     # Ctrl+C

Teaching Mode

from desktop_agent import get_agent
from desktop_agent.teacher import TaskTeacher

agent = get_agent()
teacher = TaskTeacher(agent)

# Start teaching
teacher.start_teaching("my_task")

# Record actions
teacher.record_click()        # Records current mouse position
teacher.record_click(100, 200) # Records specific position
teacher.record_type("text")
teacher.record_press("enter")
teacher.record_hotkey("ctrl", "s")
teacher.record_wait(2)        # Wait 2 seconds

# Show steps
teacher.show_steps()

# Save or cancel
teacher.finish_teaching()
teacher.cancel_teaching()

Running Learned Tasks

agent = get_agent()
tasks = agent.list_tasks()    # List all tasks
agent.execute_task("task_name") # Run a task

Files

  • desktop_agent/__init__.py - Core agent
  • desktop_agent/teacher.py - Teaching system
  • learned_tasks/ - Saved task definitions

Notes

  • AI can see screen and control mouse/keyboard
  • User can teach by demonstration
  • Tasks are saved as JSON and reusable
  • Use with caution - can control any application
Usage Guidance
This skill's functionality (screen capture, OCR, mouse/keyboard automation, saving and replaying tasks) matches its description, but there are a few red flags you should address before installing: 1) The package includes code that depends on many native Python libraries (pyautogui, mss, numpy, pillow, opencv, easyocr) but provides no install instructions — make sure you install these in a controlled environment (virtualenv/container) and verify versions. 2) The core initializer in __init__.py appears truncated/buggy (a reference like 'workspace = works' in the shipped file), so the skill may fail or behave unexpectedly; ask the author for the complete file and a clear get_agent(workspace) behavior. 3) The skill can see your screen and control your mouse/keyboard and will save tasks to disk — only use it if you trust the source and run it under caution (isolated environment, limited agent autonomy). 4) There is no network exfiltration code visible, but review the complete files and confirm there are no hidden network calls in the missing/truncated section. Recommended actions: run the code locally in an isolated VM or container, inspect the full __init__.py (and any truncated content), require explicit user invocation (disable autonomous invocation if possible), and get a dependency/install manifest from the publisher before use.
Capability Analysis
Type: OpenClaw Skill Name: desktop-agent Version: 1.0.0 The skill provides extensive desktop control capabilities, including screen capture, OCR (via easyocr), and full mouse/keyboard manipulation (via pyautogui). While these features are aligned with the stated purpose of a desktop automation agent, they represent high-risk behaviors that allow an AI to interact with any application or sensitive data on the host system. Furthermore, desktop_agent/__init__.py contains a hardcoded Windows path to a specific user's directory (C:\Users\funky\.openclaw\workspace). Although no evidence of intentional malice or data exfiltration was found, the broad permissions and potential for misuse via prompt injection justify a suspicious classification.
Capability Assessment
Purpose & Capability
Name, description, SKILL.md, and included Python modules align: the code provides screenshot capture, OCR, image/template matching, mouse/keyboard control, and a teach/save task system — all expected for a 'Desktop Agent'.
Instruction Scope
SKILL.md explicitly instructs using functions that capture the screen and control mouse/keyboard; these are within the declared purpose. The instructions cause the agent to create and write task JSON files to a local learned_tasks directory — this is expected but is powerful (it can record and replay arbitrary interactions).
Install Mechanism
There is no install spec but the code imports many heavy native Python packages (pyautogui, mss, numpy, pillow/PIL, opencv/cv2, easyocr). The skill provides no guidance for installing these dependencies; that mismatch (no install instructions + heavy deps) is a packaging/operational gap and increases friction/risk for users.
Credentials
The skill requests no environment variables, credentials, or external config paths. It operates locally (screen, mouse/keyboard, local filesystem) which is proportional to its stated function.
Persistence & Privilege
always:false and no system-level modifications are declared. However, because the skill enables direct desktop control and the platform allows autonomous invocation by default, granting it to an agent gives it the ability to perform potentially dangerous UI actions if invoked autonomously — consider limiting autonomous use or granting only when explicitly invoked.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install desktop-agent
  3. After installation, invoke the skill by name or use /desktop-agent
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of Desktop AI Agent Skill: - Enables controlling desktop applications via Python commands (mouse, keyboard, screenshots). - Supports "Teaching Mode" for creating reusable automation tasks by demonstration. - Allows running and managing learned tasks. - Includes core agent, teaching system, and task storage. - Use with caution: full control over screen, mouse, and keyboard.
Metadata
Slug desktop-agent
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Desktop Agent?

Control desktop apps via mouse and keyboard, capture screenshots, teach AI tasks by demonstration, and automate workflows with saved reusable tasks. It is an AI Agent Skill for Claude Code / OpenClaw, with 95 downloads so far.

How do I install Desktop Agent?

Run "/install desktop-agent" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Desktop Agent free?

Yes, Desktop Agent is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Desktop Agent support?

Desktop Agent is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Desktop Agent?

It is built and maintained by Fuzzyb33s (@fuzzyb33s); the current version is v1.0.0.

💬 Comments