← Back to Skills Marketplace

UI-Agent

Name: UI-Agent
Author: nimaansari

by Nima Ansari · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

100

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install ui-agent

Description

Universal UI automation for browsers and desktops. Chrome DevTools Protocol + native APIs. 15/15 verified tests.

Usage Guidance

This skill contains real code for browser (CDP) and desktop automation and will try to launch Chrome, Xvfb/GUI tools, and run system utilities (xdotool, pgrep, scrot/gnome-screenshot, etc.). Before installing or enabling it: 1) Only install from a trusted source — the manifest’s source/homepage are absent and the repo links in docs are placeholders. 2) Expect to need system packages (Chrome/Chromium, display server/Xvfb or Wayland tools, xdotool/ydotool, screenshot tools); validate and control those dependencies in an isolated environment (VM/container). 3) Review the src/ files (chrome_session_vbox_fixed.py, desktop_helpers.py, cdp_typer.py) to ensure you’re comfortable with the skill’s ability to run shell commands, kill/relaunch processes, and read/write files (it writes cookies and screenshots to /tmp). 4) Don’t grant this skill access to machines with sensitive active sessions unless you trust it — its cookie read/restore behavior can persist web sessions. 5) Run the skill initially in a sandboxed VM with minimal privileges and inspect test outputs before letting it run autonomously.

Capability Analysis

Type: OpenClaw Skill Name: ui-agent Version: 1.0.0 The bundle provides a comprehensive suite for browser and desktop automation using the Chrome DevTools Protocol (CDP) and X11 utilities like xdotool and wmctrl. It includes high-risk capabilities such as executing shell commands via subprocess and os.system (src/cdp_typer.py, src/desktop_helpers.py), bypassing browser security features (e.g., --no-sandbox and suppress_origin=True), and programmatically extracting and restoring browser cookies for session persistence (tests/test_sp1_official.py). While these features are aligned with the stated goal of a universal UI automation framework, the broad system access and potential for abuse—such as session hijacking or unauthorized remote control—meet the threshold for a suspicious classification despite the lack of clear evidence of malicious intent.

Capability Assessment

⚠ Purpose & Capability

The name/description (UI automation via CDP + native APIs) matches the code and tests: the repo contains a CDP wrapper, VirtualBox-safe Chrome launcher, AT-SPI2/X11 helpers, and verification utilities. However, registry metadata claims no required binaries or environment setup while the code and docs clearly expect Chrome/Chromium, Xvfb (or a display), xdotool/ydotool, screenshot tools, etc. That mismatch (metadata says 'none' while the implementation requires many system-level tools) is unexpected and should be resolved before trusting the skill.

ℹ Instruction Scope

The SKILL.md and docs explicitly instruct the agent to launch and reuse Chrome, send CDP commands, take screenshots, run shell commands, kill/relaunch Chrome, write/read files under /tmp, and call subprocess tools (pgrep, xdotool, xclip). Those actions are within the stated purpose (UI automation), but they are powerful: the skill can execute arbitrary shell commands via its shell() helper and manipulate other processes. The SKILL.md documents this capability (e.g., agent.shell, desktop_helpers.shell), so it's not hidden — but it does grant broad system access which users should treat as high-impact.

⚠ Install Mechanism

There is no install spec (instruction-only), which reduces supply-chain risk, but the repo and docs require system-level packages (Chrome, Xvfb, xdotool, scrot/gnome-screenshot, ydotool, etc.). The skill does not declare these required system binaries in the registry metadata. If you install/run the skill as-is it may fail or behave unpredictably on systems that lack those tools; conversely, an environment with those tools gives the skill extensive ability to control the host. No remote download or archive installs were observed in the manifest.

ℹ Credentials

The skill does not request any environment variables, credentials, or secret tokens in its metadata (primaryEnv: none), which is appropriate. It does, however, manipulate browser cookies (saves/restores document.cookie), write files under /tmp, and communicate with external sites (e.g., httpbin.org) as part of tests. Those behaviors are coherent for a UI automation tool but mean the skill can capture and restore session material (cookies) — a legitimate feature for automation but also something to be mindful of if you have sensitive sessions on the host.

✓ Persistence & Privilege

The skill is not always-enabled (always:false) and does not request elevated platform privileges in the manifest. Autonomy (disable-model-invocation:false) is the platform default. The skill's code can launch and kill browsers and run shell commands, but it does not appear to modify other skills or system-wide OpenClaw configuration. Still, combining autonomous invocation with the ability to execute shell commands and manage processes increases the potential blast radius if the skill is given unsupervised authority.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install ui-agent
After installation, invoke the skill by name or use /ui-agent
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

UIAgent v1.0.0 – Initial public release - Universal UI automation framework for browsers (CDP) and desktops (OS-native APIs) - 100% test coverage: 15/15 real-world verified tests passing - Supports automation for web workflows, dynamic UIs, and desktop applications - Evidence-based verification: screenshot hashing, DOM checks, file checks - Reliable cross-browser session management and persistence, including headless Chrome - Includes robust API: JavaScript execution, mouse/keyboard simulation, screenshots, and more

Metadata

Slug ui-agent

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is UI-Agent?

Universal UI automation for browsers and desktops. Chrome DevTools Protocol + native APIs. 15/15 verified tests. It is an AI Agent Skill for Claude Code / OpenClaw, with 100 downloads so far.

How do I install UI-Agent?

Run "/install ui-agent" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is UI-Agent free?

Yes, UI-Agent is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does UI-Agent support?

UI-Agent is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created UI-Agent?

It is built and maintained by Nima Ansari (@nimaansari); the current version is v1.0.0.

More Skills