← 返回 Skills 市场
alfredjamesli

GUI Agent

作者 AlfredJamesLi · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
189
总下载
2
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install gui-claw
功能描述
GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Sup...
使用说明 (SKILL.md)

GUI Agent

STEP 0: Activate Platform (MANDATORY FIRST STEP)

Before any GUI operation, run:

python3 {baseDir}/scripts/activate.py

This detects your OS, sets up the correct action commands, and outputs platform context. After running, {baseDir}/actions/_actions.yaml contains your platform's commands.

Workflow

OBSERVE → LEARN → ACT → VERIFY → SAVE
  1. OBSERVE — Take screenshot → run OCR + detector → understand current state → read {baseDir}/skills/gui-observe/SKILL.md

  2. LEARN — First time with an app? Save components to memory → read {baseDir}/skills/gui-learn/SKILL.mdlearn_from_screenshot() auto-outputs app tips if available

  3. ACT — Pick target → execute using _actions.yaml commands → verify → read {baseDir}/skills/gui-act/SKILL.mdread {baseDir}/actions/_actions.yaml for available commands

  4. VERIFY — Screenshot again → confirm action succeeded

  5. SAVE — Record state transitions to memory → read {baseDir}/skills/gui-memory/SKILL.md for memory structure

Core Rules

  • Coordinates from detection only — OCR or GPA-GUI-Detector, NEVER from guessing
  • Look before you act — every action must be justified by what you observed
  • image tool = understanding only — use it to decide WHAT to click, get WHERE from OCR/detector

Sub-Skills Reference

Sub-Skill When to read
skills/gui-observe/SKILL.md Before screenshots or detection
skills/gui-learn/SKILL.md Before learning a new app
skills/gui-act/SKILL.md Before any click/type action
skills/gui-memory/SKILL.md For memory structure details
skills/gui-workflow/SKILL.md For multi-step navigation
skills/gui-setup/SKILL.md For first-time machine setup
skills/gui-report/SKILL.md For task performance reporting
安全使用建议
This package is broadly coherent for GUI automation but has several items you should verify before installing or running: 1) Inspect scripts/setup.sh, scripts/gui_action.py, scripts/backends/http_remote.py and scripts/backends/ssh_remote (if present) to understand what is sent to remote hosts and whether screenshots/inputs could be exfiltrated. 2) Review skills/gui-report/scripts/tracker.py — it reads ~/.openclaw/.../sessions/sessions.json to collect token/session info and will write logs and a .tracker_state.json file; decide whether that access is acceptable. 3) Run any installation or the setup script in an isolated environment (throwaway VM or container) first — the setup will create ~/gui-agent-env and download large models into your home. 4) If you will use remote control (--remote), restrict the endpoints to trusted hosts and audit the remote server implementation; remote endpoints can execute clicks/typing and receive screenshots. 5) Do not grant accessibility or elevated permissions until you confirm the exact commands the skill will run; after testing, remove permissions you do not trust. 6) If unsure, ask the author for a minimal install/run checklist or a signed release; consider code review by a trusted party before enabling this in a production agent.
功能分析
Type: OpenClaw Skill Name: gui-claw Version: 1.0.1 The gui-claw skill bundle is a legitimate and highly sophisticated GUI automation framework designed for local and remote desktop interaction. It utilizes YOLO-based object detection (GPA-GUI-Detector), OCR (Apple Vision/EasyOCR), and template matching to perceive screen states, which are then managed through a structured memory system in app_memory.py. While the bundle includes powerful capabilities such as remote command execution via http_remote.py and clipboard manipulation in platform_input.py, these features are strictly aligned with its stated purpose of GUI automation and benchmarking (e.g., OSWorld). No evidence of malicious intent, data exfiltration, or unauthorized persistence was found.
能力评估
Purpose & Capability
Name/description align with the included code: screenshot → detect → act workflow, OCR, visual memory, local and remote backends (HTTP/SSH). Heavy ML deps and a setup script are proportionate to the stated detection/OCR features.
Instruction Scope
Runtime instructions ask the operator to run scripts (activate.py, setup.sh) that detect platform, create venvs, download models, and produce actions/_actions.yaml. The code (gui_action.py + backends) supports --remote <URL> (HTTP/SSH) which will send/receive commands/screenshots to arbitrary hosts. Tracker and memory code read/write files under the user's home and OpenClaw workspace (e.g., ~/.openclaw sessions, memory/apps), so the skill accesses data outside its own directory without declaring that scope.
Install Mechanism
No registry install spec is declared (instruction-only), but scripts/setup.sh and README instruct the user to create a home venv, install heavy packages (PyTorch/YOLO/etc.) and clone models from HuggingFace — these are expected but are intrusive (large downloads, system package installs). The install flow relies on network downloads from public sources (GitHub/HuggingFace).
Credentials
The skill declares no required env vars/credentials, yet code reads OpenClaw session files (~/.openclaw/.../sessions.json) to extract token/session info, and reads/writes memory under user/home (~/GPA-GUI-Detector, ~/gui-agent-env, skill memory, logs). Accessing session/token data and user memory is sensitive; these accesses aren't documented as required credentials in the metadata.
Persistence & Privilege
The skill does not set always:true and does not request platform-wide privileges explicitly. However, setup.sh and other scripts create persistent artifacts in the user's home (venv, downloaded models, memory directories, logs, actions/_actions.yaml), and tracker auto-saves/rotates session state — these are persistent changes that the user should review before running.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gui-claw
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gui-claw 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
gui-claw 1.0.1 - Major expansion of documentation and benchmarks, including detailed design principles, workflow descriptions, and visual method guidance. - Added OS-specific action definitions for Linux and macOS. - Introduced platform detection and setup scripts. - Expanded memory/app metadata coverage for multiple desktop apps. - Initial support for both macOS and Linux automated GUI actions.
v1.0.0
Initial release (v1.0.0) - Vision-based GUI automation skill for macOS using GPA-GUI-Detector + OCR - Detection-first design: all click coordinates from detectors, never from LLM estimation - Visual memory system: component templates, activity-based forgetting, state identification - State graph navigation: automatic transition recording, BFS path planning - Hierarchical verification: template matching → full detection → VLM fallback - OSWorld Chrome domain benchmark: 97.8% success rate (45/46 tasks) - Sub-skills: gui-observe, gui-act, gui-learn, gui-memory, gui-workflow, gui-report, gui-setup
元数据
Slug gui-claw
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

GUI Agent 是什么?

GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Sup... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 189 次。

如何安装 GUI Agent?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gui-claw」即可一键安装,无需额外配置。

GUI Agent 是免费的吗?

是的,GUI Agent 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

GUI Agent 支持哪些平台?

GUI Agent 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 GUI Agent?

由 AlfredJamesLi(@alfredjamesli)开发并维护,当前版本 v1.0.1。

💬 留言讨论