← 返回 Skills 市场
murongg

UI Element Ops

作者 MuRong · GitHub ↗ · v1.0.2
cross-platform ⚠ suspicious
513
总下载
0
收藏
4
当前安装
3
版本数
在 OpenClaw 中安装
/install ui-element-ops
功能描述
Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate U...
安全使用建议
This skill appears to do what it says: it installs ML dependencies, downloads OmniParser code/weights, parses screenshots, and can automate your desktop using pyautogui. Before installing: (1) review and run the bootstrap script in an isolated environment or VM (it installs many packages and downloads models); (2) verify you trust the OmniParser GitHub repo and the HF model being downloaded; (3) be aware that operate_ui.py can click/type/press keys — test in dry-run mode first and do not allow unattended/autonomous runs unless you trust the skill and its inputs; (4) note the capture script calls system python3 (not the venv) — prefer running commands using the venv python to avoid unexpected behavior; (5) if you are concerned about privacy, inspect what screenshots/elements are stored and where (defaults are /tmp and cwd).
功能分析
Type: OpenClaw Skill Name: ui-element-ops Version: 1.0.2 The skill is classified as suspicious due to a critical shell injection vulnerability in `scripts/operate_ui.py`. The `cmd_wait` function executes an optional `--refresh-cmd` using `subprocess.run(cmd, shell=True)`. If an AI agent (or an attacker via prompt injection) can control the value of `--refresh-cmd`, it could lead to arbitrary code execution. Additionally, the script allows disabling `pyautogui.FAILSAFE`, which removes a safety mechanism during UI automation, increasing risk. The skill also relies on cloning external repositories (GitHub) and downloading models (HuggingFace), introducing supply chain risks, though these sources are generally reputable.
能力评估
Purpose & Capability
The name/description (parse UI screenshots and operate desktop UI) matches the code and scripts: parse_ui.py uses OmniParser models for detection/captioning, bootstrap installs ML libraries and downloads weights, and operate_ui.py uses pyautogui to click/type/screenshot. The one small mismatch is that bootstrap installs the 'openai' package (and some general-purpose libs) which are not used by the included scripts — likely unnecessary but not evidence of malicious intent.
Instruction Scope
SKILL.md stays on-topic (bootstrapping, parsing screenshots, listing/finding elements, and performing UI actions). The runtime instructions explicitly enable desktop control (click/type/hotkey) via pyautogui — expected for the stated purpose but high-privilege. A minor inconsistency: capture_and_parse.sh invokes operate_ui.py via the system 'python3' (not the venv python created by bootstrap), which can cause environment/runtime mismatch and unexpected behavior if system Python lacks the required packages.
Install Mechanism
There is no registry install spec, but the included bootstrap script creates a venv, pip-installs many ML packages, clones the OmniParser GitHub repo, and uses the Hugging Face CLI to download model weights. The sources used (GitHub and HF) are common release hosts; however, downloading/extracting model weights and installing many packages is high-impact and should be done deliberately (prefer isolated environment).
Credentials
The skill does not declare or require any sensitive environment variables or credentials. It optionally respects OMNIPARSER_DIR and TYPE_RULES. Note: the bootstrap uses the HF CLI to download weights — if a requested model version were private the CLI could prompt for/require a Hugging Face token, but no HF token is declared as required here. No other unrelated credentials are requested.
Persistence & Privilege
always:false (normal). The skill can autonomously perform desktop actions via pyautogui; that capability is coherent with its purpose but grants broad control over the user's desktop. Autonomous invocation combined with desktop-control is a meaningful risk vector — exercise caution when allowing the agent to call this skill without user confirmation.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ui-element-ops
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ui-element-ops 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
- Add performance note advising not to use parse/capture-and-parse commands in tight loops and to reuse recent elements.json outputs when possible. - No code changes; documentation update only.
v1.0.1
- Added capture_and_parse.sh script for one-step screenshot capture and parsing with randomized output names. - Updated documentation to include new capture + parse workflow. - Minor updates to scripts/operate_ui.py and SKILL.md for clarity and workflow expansion.
v1.0.0
- Initial release of the ui-element-ops skill. - Parses UI screenshots into structured JSON with element type, OCR text, bounding boxes, and clickable flags. - Supports overlay image output with labeled detection boxes. - Provides scripts to operate desktop UI: locate/find/wait for elements, click/type/press keys, take screenshots, and calibrate coordinates. - Includes coordinate calibration for multi-display, DPI, and window offsets. - Handles missing dependencies and supports both GUI-required and headless workflows.
元数据
Slug ui-element-ops
版本 1.0.2
许可证
累计安装 4
当前安装数 4
历史版本数 3
常见问题

UI Element Ops 是什么?

Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate U... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 513 次。

如何安装 UI Element Ops?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ui-element-ops」即可一键安装,无需额外配置。

UI Element Ops 是免费的吗?

是的,UI Element Ops 完全免费(开源免费),可自由下载、安装和使用。

UI Element Ops 支持哪些平台?

UI Element Ops 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 UI Element Ops?

由 MuRong(@murongg)开发并维护,当前版本 v1.0.2。

💬 留言讨论