← 返回 Skills 市场

GUI Agent

Name: GUI Agent
Author: alfredjamesli

作者 AlfredJamesLi · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

189

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gui-claw

功能描述

GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Sup...

使用说明 (SKILL.md)

GUI Agent

STEP 0: Activate Platform (MANDATORY FIRST STEP)

Before any GUI operation, run:

python3 {baseDir}/scripts/activate.py

This detects your OS, sets up the correct action commands, and outputs platform context. After running, {baseDir}/actions/_actions.yaml contains your platform's commands.

Workflow

OBSERVE → LEARN → ACT → VERIFY → SAVE

OBSERVE — Take screenshot → run OCR + detector → understand current state → read {baseDir}/skills/gui-observe/SKILL.md
LEARN — First time with an app? Save components to memory → read {baseDir}/skills/gui-learn/SKILL.md → learn_from_screenshot() auto-outputs app tips if available
ACT — Pick target → execute using _actions.yaml commands → verify → read {baseDir}/skills/gui-act/SKILL.md → read {baseDir}/actions/_actions.yaml for available commands
VERIFY — Screenshot again → confirm action succeeded
SAVE — Record state transitions to memory → read {baseDir}/skills/gui-memory/SKILL.md for memory structure

Core Rules

Coordinates from detection only — OCR or GPA-GUI-Detector, NEVER from guessing
Look before you act — every action must be justified by what you observed
image tool = understanding only — use it to decide WHAT to click, get WHERE from OCR/detector

Sub-Skills Reference

Sub-Skill	When to read
`skills/gui-observe/SKILL.md`	Before screenshots or detection
`skills/gui-learn/SKILL.md`	Before learning a new app
`skills/gui-act/SKILL.md`	Before any click/type action
`skills/gui-memory/SKILL.md`	For memory structure details
`skills/gui-workflow/SKILL.md`	For multi-step navigation
`skills/gui-setup/SKILL.md`	For first-time machine setup
`skills/gui-report/SKILL.md`	For task performance reporting

安全使用建议

This package is broadly coherent for GUI automation but has several items you should verify before installing or running: 1) Inspect scripts/setup.sh, scripts/gui_action.py, scripts/backends/http_remote.py and scripts/backends/ssh_remote (if present) to understand what is sent to remote hosts and whether screenshots/inputs could be exfiltrated. 2) Review skills/gui-report/scripts/tracker.py — it reads ~/.openclaw/.../sessions/sessions.json to collect token/session info and will write logs and a .tracker_state.json file; decide whether that access is acceptable. 3) Run any installation or the setup script in an isolated environment (throwaway VM or container) first — the setup will create ~/gui-agent-env and download large models into your home. 4) If you will use remote control (--remote), restrict the endpoints to trusted hosts and audit the remote server implementation; remote endpoints can execute clicks/typing and receive screenshots. 5) Do not grant accessibility or elevated permissions until you confirm the exact commands the skill will run; after testing, remove permissions you do not trust. 6) If unsure, ask the author for a minimal install/run checklist or a signed release; consider code review by a trusted party before enabling this in a production agent.

功能分析

Type: OpenClaw Skill Name: gui-claw Version: 1.0.1 The gui-claw skill bundle is a legitimate and highly sophisticated GUI automation framework designed for local and remote desktop interaction. It utilizes YOLO-based object detection (GPA-GUI-Detector), OCR (Apple Vision/EasyOCR), and template matching to perceive screen states, which are then managed through a structured memory system in app_memory.py. While the bundle includes powerful capabilities such as remote command execution via http_remote.py and clipboard manipulation in platform_input.py, these features are strictly aligned with its stated purpose of GUI automation and benchmarking (e.g., OSWorld). No evidence of malicious intent, data exfiltration, or unauthorized persistence was found.

能力评估

✓ Purpose & Capability

Name/description align with the included code: screenshot → detect → act workflow, OCR, visual memory, local and remote backends (HTTP/SSH). Heavy ML deps and a setup script are proportionate to the stated detection/OCR features.

⚠ Instruction Scope

Runtime instructions ask the operator to run scripts (activate.py, setup.sh) that detect platform, create venvs, download models, and produce actions/_actions.yaml. The code (gui_action.py + backends) supports --remote <URL> (HTTP/SSH) which will send/receive commands/screenshots to arbitrary hosts. Tracker and memory code read/write files under the user's home and OpenClaw workspace (e.g., ~/.openclaw sessions, memory/apps), so the skill accesses data outside its own directory without declaring that scope.

ℹ Install Mechanism

No registry install spec is declared (instruction-only), but scripts/setup.sh and README instruct the user to create a home venv, install heavy packages (PyTorch/YOLO/etc.) and clone models from HuggingFace — these are expected but are intrusive (large downloads, system package installs). The install flow relies on network downloads from public sources (GitHub/HuggingFace).

⚠ Credentials

The skill declares no required env vars/credentials, yet code reads OpenClaw session files (~/.openclaw/.../sessions.json) to extract token/session info, and reads/writes memory under user/home (~/GPA-GUI-Detector, ~/gui-agent-env, skill memory, logs). Accessing session/token data and user memory is sensitive; these accesses aren't documented as required credentials in the metadata.

ℹ Persistence & Privilege

The skill does not set always:true and does not request platform-wide privileges explicitly. However, setup.sh and other scripts create persistent artifacts in the user's home (venv, downloaded models, memory directories, logs, actions/_actions.yaml), and tracker auto-saves/rotates session state — these are persistent changes that the user should review before running.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gui-claw
安装完成后，直接呼叫该 Skill 的名称或使用 /gui-claw 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

gui-claw 1.0.1 - Major expansion of documentation and benchmarks, including detailed design principles, workflow descriptions, and visual method guidance. - Added OS-specific action definitions for Linux and macOS. - Introduced platform detection and setup scripts. - Expanded memory/app metadata coverage for multiple desktop apps. - Initial support for both macOS and Linux automated GUI actions.

v1.0.0

Initial release (v1.0.0) - Vision-based GUI automation skill for macOS using GPA-GUI-Detector + OCR - Detection-first design: all click coordinates from detectors, never from LLM estimation - Visual memory system: component templates, activity-based forgetting, state identification - State graph navigation: automatic transition recording, BFS path planning - Hierarchical verification: template matching → full detection → VLM fallback - OSWorld Chrome domain benchmark: 97.8% success rate (45/46 tasks) - Sub-skills: gui-observe, gui-act, gui-learn, gui-memory, gui-workflow, gui-report, gui-setup

元数据

Slug gui-claw

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

GUI Agent 是什么？

GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Sup... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 189 次。

如何安装 GUI Agent？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gui-claw」即可一键安装，无需额外配置。

GUI Agent 是免费的吗？

是的，GUI Agent 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

GUI Agent 支持哪些平台？

GUI Agent 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 GUI Agent？

由 AlfredJamesLi（@alfredjamesli）开发并维护，当前版本 v1.0.1。

GUI Agent

GUI Agent

STEP 0: Activate Platform (MANDATORY FIRST STEP)

Workflow

Core Rules

Sub-Skills Reference

GUI Agent 是什么？

如何安装 GUI Agent？

GUI Agent 是免费的吗？

GUI Agent 支持哪些平台？

谁开发了 GUI Agent？

💬 留言讨论