GUI Agent
/install gui-claw
GUI Agent
STEP 0: Activate Platform (MANDATORY FIRST STEP)
Before any GUI operation, run:
python3 {baseDir}/scripts/activate.py
This detects your OS, sets up the correct action commands, and outputs platform context.
After running, {baseDir}/actions/_actions.yaml contains your platform's commands.
Workflow
OBSERVE → LEARN → ACT → VERIFY → SAVE
-
OBSERVE — Take screenshot → run OCR + detector → understand current state →
read {baseDir}/skills/gui-observe/SKILL.md -
LEARN — First time with an app? Save components to memory →
read {baseDir}/skills/gui-learn/SKILL.md→learn_from_screenshot()auto-outputs app tips if available -
ACT — Pick target → execute using
_actions.yamlcommands → verify →read {baseDir}/skills/gui-act/SKILL.md→read {baseDir}/actions/_actions.yamlfor available commands -
VERIFY — Screenshot again → confirm action succeeded
-
SAVE — Record state transitions to memory →
read {baseDir}/skills/gui-memory/SKILL.mdfor memory structure
Core Rules
- Coordinates from detection only — OCR or GPA-GUI-Detector, NEVER from guessing
- Look before you act — every action must be justified by what you observed
- image tool = understanding only — use it to decide WHAT to click, get WHERE from OCR/detector
Sub-Skills Reference
| Sub-Skill | When to read |
|---|---|
skills/gui-observe/SKILL.md |
Before screenshots or detection |
skills/gui-learn/SKILL.md |
Before learning a new app |
skills/gui-act/SKILL.md |
Before any click/type action |
skills/gui-memory/SKILL.md |
For memory structure details |
skills/gui-workflow/SKILL.md |
For multi-step navigation |
skills/gui-setup/SKILL.md |
For first-time machine setup |
skills/gui-report/SKILL.md |
For task performance reporting |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install gui-claw - After installation, invoke the skill by name or use
/gui-claw - Provide required inputs per the skill's parameter spec and get structured output
What is GUI Agent?
GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Sup... It is an AI Agent Skill for Claude Code / OpenClaw, with 189 downloads so far.
How do I install GUI Agent?
Run "/install gui-claw" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is GUI Agent free?
Yes, GUI Agent is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does GUI Agent support?
GUI Agent is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created GUI Agent?
It is built and maintained by AlfredJamesLi (@alfredjamesli); the current version is v1.0.1.