← Back to Skills Marketplace
murongg

UI Element Ops

by MuRong · GitHub ↗ · v1.0.2
cross-platform ⚠ suspicious
513
Downloads
0
Stars
4
Active Installs
3
Versions
Install in OpenClaw
/install ui-element-ops
Description
Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate U...
Usage Guidance
This skill appears to do what it says: it installs ML dependencies, downloads OmniParser code/weights, parses screenshots, and can automate your desktop using pyautogui. Before installing: (1) review and run the bootstrap script in an isolated environment or VM (it installs many packages and downloads models); (2) verify you trust the OmniParser GitHub repo and the HF model being downloaded; (3) be aware that operate_ui.py can click/type/press keys — test in dry-run mode first and do not allow unattended/autonomous runs unless you trust the skill and its inputs; (4) note the capture script calls system python3 (not the venv) — prefer running commands using the venv python to avoid unexpected behavior; (5) if you are concerned about privacy, inspect what screenshots/elements are stored and where (defaults are /tmp and cwd).
Capability Analysis
Type: OpenClaw Skill Name: ui-element-ops Version: 1.0.2 The skill is classified as suspicious due to a critical shell injection vulnerability in `scripts/operate_ui.py`. The `cmd_wait` function executes an optional `--refresh-cmd` using `subprocess.run(cmd, shell=True)`. If an AI agent (or an attacker via prompt injection) can control the value of `--refresh-cmd`, it could lead to arbitrary code execution. Additionally, the script allows disabling `pyautogui.FAILSAFE`, which removes a safety mechanism during UI automation, increasing risk. The skill also relies on cloning external repositories (GitHub) and downloading models (HuggingFace), introducing supply chain risks, though these sources are generally reputable.
Capability Assessment
Purpose & Capability
The name/description (parse UI screenshots and operate desktop UI) matches the code and scripts: parse_ui.py uses OmniParser models for detection/captioning, bootstrap installs ML libraries and downloads weights, and operate_ui.py uses pyautogui to click/type/screenshot. The one small mismatch is that bootstrap installs the 'openai' package (and some general-purpose libs) which are not used by the included scripts — likely unnecessary but not evidence of malicious intent.
Instruction Scope
SKILL.md stays on-topic (bootstrapping, parsing screenshots, listing/finding elements, and performing UI actions). The runtime instructions explicitly enable desktop control (click/type/hotkey) via pyautogui — expected for the stated purpose but high-privilege. A minor inconsistency: capture_and_parse.sh invokes operate_ui.py via the system 'python3' (not the venv python created by bootstrap), which can cause environment/runtime mismatch and unexpected behavior if system Python lacks the required packages.
Install Mechanism
There is no registry install spec, but the included bootstrap script creates a venv, pip-installs many ML packages, clones the OmniParser GitHub repo, and uses the Hugging Face CLI to download model weights. The sources used (GitHub and HF) are common release hosts; however, downloading/extracting model weights and installing many packages is high-impact and should be done deliberately (prefer isolated environment).
Credentials
The skill does not declare or require any sensitive environment variables or credentials. It optionally respects OMNIPARSER_DIR and TYPE_RULES. Note: the bootstrap uses the HF CLI to download weights — if a requested model version were private the CLI could prompt for/require a Hugging Face token, but no HF token is declared as required here. No other unrelated credentials are requested.
Persistence & Privilege
always:false (normal). The skill can autonomously perform desktop actions via pyautogui; that capability is coherent with its purpose but grants broad control over the user's desktop. Autonomous invocation combined with desktop-control is a meaningful risk vector — exercise caution when allowing the agent to call this skill without user confirmation.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ui-element-ops
  3. After installation, invoke the skill by name or use /ui-element-ops
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
- Add performance note advising not to use parse/capture-and-parse commands in tight loops and to reuse recent elements.json outputs when possible. - No code changes; documentation update only.
v1.0.1
- Added capture_and_parse.sh script for one-step screenshot capture and parsing with randomized output names. - Updated documentation to include new capture + parse workflow. - Minor updates to scripts/operate_ui.py and SKILL.md for clarity and workflow expansion.
v1.0.0
- Initial release of the ui-element-ops skill. - Parses UI screenshots into structured JSON with element type, OCR text, bounding boxes, and clickable flags. - Supports overlay image output with labeled detection boxes. - Provides scripts to operate desktop UI: locate/find/wait for elements, click/type/press keys, take screenshots, and calibrate coordinates. - Includes coordinate calibration for multi-display, DPI, and window offsets. - Handles missing dependencies and supports both GUI-required and headless workflows.
Metadata
Slug ui-element-ops
Version 1.0.2
License
All-time Installs 4
Active Installs 4
Total Versions 3
Frequently Asked Questions

What is UI Element Ops?

Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate U... It is an AI Agent Skill for Claude Code / OpenClaw, with 513 downloads so far.

How do I install UI Element Ops?

Run "/install ui-element-ops" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is UI Element Ops free?

Yes, UI Element Ops is completely free (open-source). You can download, install and use it at no cost.

Which platforms does UI Element Ops support?

UI Element Ops is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created UI Element Ops?

It is built and maintained by MuRong (@murongg); the current version is v1.0.2.

💬 Comments