Description

Computer use for GUI automation tasks via VLA models. Use when the user describes a task in natural language that requires visual screen interaction and no A...

README (SKILL.md)

mano-cua

Name: mano-cua
Author: hanningwang

Desktop GUI automation driven by natural language. Captures screenshots, sends them to a cloud-based hybrid vision model, and executes the returned actions on the local machine — click, type, scroll, drag, and more.

Requirements

A system with a graphical desktop (macOS / Windows / Linux)
mano-cua binary installed

Installation

macOS / Linux (Homebrew):

brew install Mininglamp-AI/tap/mano-cua

Windows:

Download the latest mano-cua-windows.zip from GitHub Releases, extract it, and add the folder to your PATH.

Usage

# Run a task (cloud mode, default)
mano-cua run "your task description"

# Run with options
mano-cua run "task" --minimize --max-steps 10

# Run in local mode (on-device inference, macOS Apple Silicon only)
mano-cua run "task" --local

# Stop the current running task
mano-cua stop

Run mano-cua --help or mano-cua \x3Ccommand> --help for full flags and options.

Note: Only one task can run at a time per device. If you need to start a new task, first stop the current one with mano-cua stop.

Local Mode

Runs Mano-P entirely on-device via MLX. No data leaves the machine. Requires macOS with Apple Silicon (M1+).

Setup:

mano-cua check
mano-cua install-sdk
mano-cua install-model

Run:

mano-cua run "Open Safari and search for Python" --local
mano-cua run "在搜索框中输入hello" --local --url "https://www.baidu.com" --minimize --max-steps 15

Examples

# Cloud mode (default — no setup needed)
mano-cua run "Open WeChat and tell FTY that the meeting is postponed"
mano-cua run "Search for AI news in Xiaohongshu and show the first post" --minimize --max-steps 20

# Local mode
mano-cua run "Compare the flight price tiers" --local --url "https://www.flightaware.com/"

# Stop the current task (use before starting a new one)
mano-cua stop

How It Works

The current screenshot is captured and sent to the cloud at each step. A hybrid vision solution decides the next action:

Mano model — handles straightforward, lightweight tasks with rapid output.
Claude CUA model — handles complex tasks requiring deeper reasoning.

The system automatically selects the appropriate model based on task complexity.

In local mode (--local), a local Mano-P model runs on-device via MLX. No network calls for inference.

Supported Interactions

click · type · hotkey · scroll · drag · mouse move · screenshot · wait · app launch · url direction

Status Panel

A small UI panel is displayed on the top-right corner of the screen to track and manage the current session status.

Data, Privacy & Safety

What is sent: Screenshots of the primary display and the task description are sent to mano.mininglamp.com — these are the minimal inputs required for the vision model to determine the next action.
What is NOT sent: No local files, clipboard content, or system credentials are read or transmitted. All network calls are in a single module (task_model.py) for easy review.
Local mode: All inference runs on-device using Mano-P (model weights). No data leaves the machine.
Authentication: No API key or credentials are required. The client identifies itself with a locally generated device ID (~/.myapp_device_id) — no secrets are embedded in the binary.
Supply chain: The full client is open source. The Homebrew formula builds directly from this public source, ensuring the installed binary is fully auditable.
User control: Users can stop any session at any time via the UI panel or mano-cua stop.

Important Notes

Do not use the mouse or keyboard during the task. Manual input while mano-cua is running may cause unexpected behavior.
Multiple displays: only the primary display is used. All mouse movements, clicks, and screenshots are restricted to that display.

Platform Support

macOS is the preferred and most tested platform. Adaptations for Windows and Linux are not yet fully completed — minor issues are expected.

Usage Guidance

This skill is coherent for GUI automation, but exercise caution before installing and running it on machines that display sensitive information. Actions to consider: - Review the Homebrew formula (Mininglamp-AI/tap/mano-cua) and the upstream repo (GitHub) before installing to confirm what the built binary does and which endpoints it contacts. - If you are privacy-sensitive, prefer the local mode (on-device Mano-P) when available (macOS Apple Silicon) to avoid sending screenshots off-device. - Treat the device ID file (~/.myapp_device_id) as a persistent artifact; inspect its contents and permissions after first run. - Test in a controlled environment (VM or spare account) and avoid running while passwords, 2FA codes, or other secrets are visible on-screen. - Verify TLS and hostnames (mano.mininglamp.com) and consider network monitoring to confirm traffic matches the documentation. If you want a stronger assurance that only the stated data is transmitted, request the exact Homebrew formula and the task_model.py network logic from the maintainer or inspect the built binary's network calls locally before using it on sensitive systems.

Capability Analysis

Type: OpenClaw Skill Name: mano-cua Version: 1.0.4 The 'mano-cua' skill performs high-risk GUI automation by capturing screenshots and executing local system actions (clicking, typing, dragging) based on responses from a remote cloud endpoint (mano.mininglamp.com). While the SKILL.md documentation is transparent about this behavior and provides a 'Local Mode' for privacy, the architecture inherently allows for remote-driven execution and data transmission to a third-party server. These capabilities, while necessary for the stated purpose, represent a significant attack surface for potential misuse or compromise of the local environment.

Capability Tags

requires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name and description match the declared behavior: a GUI automation client that captures screenshots and uses cloud (or local) vision models to decide actions. Requesting no credentials and having both cloud and on-device modes is consistent with the stated purpose.

ℹ Instruction Scope

The SKILL.md explicitly directs the client to capture screenshots of the primary display and send them (plus the task description) to mano.mininglamp.com — this is within scope for GUI automation but intrinsically privacy-sensitive. The doc also mentions a local device-id file (~/.myapp_device_id) even though the registry metadata lists no required config paths; the skill claims it does not read clipboard or system credentials, but there is no code bundled here to independently verify that claim.

ℹ Install Mechanism

Install via a Homebrew formula Mininglamp-AI/tap/mano-cua. That's a third‑party tap (not the core brew repo), which is reasonable but worth reviewing. Windows instructions point to GitHub Releases (manual download). No install archives inside the skill bundle to inspect.

✓ Credentials

No environment variables, API keys, or credentials are requested, which aligns with the description. The only persistent artifact mentioned is a locally generated device ID file (~/.myapp_device_id); that path was not declared in the registry metadata and should be noted by users.

✓ Persistence & Privilege

always:false and normal autonomous invocation. The skill displays a small status UI and may write a device-id file, but it does not require system-wide privileges or declare forced persistence. Nothing in the manifest indicates it modifies other skills or global agent settings.

Version History

v1.0.4

Version 1.0.4 of mano-cua introduces local, on-device inference and updates project links: - Added "local mode" for on-device MLX-based Mano-P model runs (macOS Apple Silicon only), ensuring no data leaves the machine. - Added setup guidance and CLI options for local mode, including new commands: `mano-cua check`, `install-sdk`, `install-model`, and the `--local` flag. - Expanded usage examples and full CLI command details. - Updated project homepage and package source from `HanningWang` to `Mininglamp-AI`. - Enhanced privacy section to detail local mode with no network inference. - Minor fixes and improvements to documentation structure and clarity.

v1.0.3

- Added command-line usage instructions with argument details for `mano-cua run` and `mano-cua stop`. - Updated Homebrew formula description for improved clarity about installation source. - Minor improvements and clarifications in documentation, especially about supply chain and platform support.

v1.0.2

- Clarified privacy and network behavior: all network calls are now documented as being in a single, easily reviewable module. - Added details on authentication: no API key or credentials are required; identification uses a locally generated device ID. - Small refinements and clarifications in the Data, Privacy & Safety section for greater transparency.

v1.0.1

- Clarified the open source and supply chain policy, including a link to the Homebrew formula and specifying builds from source. - Added information about required local permissions for screen capture and keyboard/mouse control, including macOS prompts. - Expanded privacy & safety section to reflect these updates. - No functional or usage changes.

v1.0.0

Initial release of mano-cua — desktop GUI automation via natural language. - Enables automation of desktop apps using natural language commands and visual screen interaction. - Captures screenshots and sends them, alongside user tasks, to a cloud-based vision model for action inference. - Supports macOS (preferred), with ongoing adaptation for Windows and Linux. - Provides actions like click, type, scroll, drag, hotkey, app launch, and more. - Features a status panel for task tracking and easy session management. - Emphasizes user privacy: only screenshots and task descriptions are sent; no sensitive local data is transmitted.

Metadata

Slug mano-cua

Version 1.0.4

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 5

Frequently Asked Questions

What is mano-cua?

Computer use for GUI automation tasks via VLA models. Use when the user describes a task in natural language that requires visual screen interaction and no A... It is an AI Agent Skill for Claude Code / OpenClaw, with 1113 downloads so far.

How do I install mano-cua?

Run "/install mano-cua" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is mano-cua free?

Yes, mano-cua is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does mano-cua support?

mano-cua is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created mano-cua?

It is built and maintained by HanningWang (@hanningwang); the current version is v1.0.4.

More Skills

mano-cua

mano-cua

Requirements

Installation

Usage

Local Mode

Examples

How It Works

Supported Interactions

Status Panel

Data, Privacy & Safety

Important Notes

Platform Support

What is mano-cua?

How do I install mano-cua?

Is mano-cua free?

Which platforms does mano-cua support?

Who created mano-cua?

💬 Comments