← Back to Skills Marketplace

Claw Vision

Name: Claw Vision
Author: puma1981

by Puma1981 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

561

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install claw-vision

Description

Analyze local images including screenshots, receipts, and documents to extract structured text, UI elements, and provide content summaries with confidence le...

README (SKILL.md)

能力定位

本地图片路径 → 结构化文本理解。通过 vision-tool.py 调用 Gemini 3.1 Pro Preview（NUWA Flux）。

触发场景

用户发送截图、照片、图片文件
关键词：截图、图片里有什么、识别、screenshot、describe image

调用方式

python3 ~/Documents/OpenClaw/workspace/scripts/vision-tool.py \x3C图片绝对路径> "\x3C提示语>"

参数

参数	必填	默认值
图片路径	✅	—
提示语	✅	"图片里有什么？"

支持格式

PNG / JPG / JPEG / GIF / WEBP（仅本地文件，不支持URL）

输出规范

[summary]     图片内容概述
[fields]      关键字段提取（含文字/表格时）
[ui_elements] 界面元素列表（UI截图时）
[confidence]   置信度: 高/中/低

依赖

vision-tool.py: ~/Documents/OpenClaw/workspace/scripts/vision-tool.py
API: NUWA Flux gemini-3.1-pro-preview

Usage Guidance

Do not run or give this skill access until you verify the helper script. Ask the author to provide the vision-tool.py source or bundle it with the skill so it can be reviewed. Confirm how NUWA/Gemini credentials are stored and ensure they are not hardcoded in an opaque script. If you must test, inspect the script manually or run it inside a restricted sandbox (container) to prevent unintended file access or network exfiltration. Prefer skills that declare required env vars and include or link to verifiable code or an install step from a trusted release URL.

Capability Analysis

Type: OpenClaw Skill Name: claw-vision Version: 1.0.0 The skill bundle defines an execution pattern in SKILL.md that is vulnerable to shell injection by instructing the agent to pass user-controlled strings directly into a shell command (python3 ... "<提示语>"). It also references a fictional model version (Gemini 3.1 Pro) and relies on an external script (vision-tool.py) located in the user's home directory rather than including it in the bundle. While no clear malicious intent or exfiltration logic is present, the insecure command construction and external dependencies are high-risk indicators.

Capability Assessment

ℹ Purpose & Capability

The declared purpose (analyze local images) matches the instruction to run a local vision tool. However, the skill depends on a hardcoded user-local script path (~/Documents/OpenClaw/workspace/scripts/vision-tool.py) and references the NUWA/Gemini API without declaring how authentication should be provided. Requiring an arbitrary local script at that path is unusual and not justified in the SKILL.md.

⚠ Instruction Scope

SKILL.md instructs the agent to run a user-local Python script with an arbitrary image path and prompt. Because the script is not included, its behavior is unknown — it could read any files under the user's home, access network endpoints, or perform other actions. The instructions do not place limits on what the script may do or where credentials come from.

✓ Install Mechanism

There is no install spec and no code files in the skill bundle (instruction-only), which minimizes risk from untrusted downloads. However, the lack of an included script means the agent will rely on an external file that cannot be analyzed.

⚠ Credentials

The SKILL.md references NUWA Flux / gemini-3.1-pro-preview but declares no required environment variables or primary credential. It's unclear how the vision-tool.py authenticates to the external API (missing API key/env guidance). This mismatch is a red flag: either credentials are expected to exist elsewhere on the system or the script will prompt/handle them — both are security-relevant behaviors that should be declared.

ℹ Persistence & Privilege

The skill does not request persistent or always-on privileges (always:false). Nevertheless, it instructs execution of a local script which can run arbitrary code when invoked; that is a runtime privilege but not a declared persistent capability. No modifications to other skills or system-wide settings are specified.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install claw-vision
After installation, invoke the skill by name or use /claw-vision
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

claw-vision 1.0.0 - Initial release. - Analyze images using Gemini 3.1 Pro Preview (NUWA Flux) via vision-tool.py. - Supports local PNG, JPG, JPEG, GIF, and WEBP files. - Outputs structured results: summary, key fields, UI elements, and confidence level. - Triggered when users send image files or relevant keywords/commands.

Metadata

Slug claw-vision

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Claw Vision?

Analyze local images including screenshots, receipts, and documents to extract structured text, UI elements, and provide content summaries with confidence le... It is an AI Agent Skill for Claude Code / OpenClaw, with 561 downloads so far.

How do I install Claw Vision?

Run "/install claw-vision" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Claw Vision free?

Yes, Claw Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Claw Vision support?

Claw Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Claw Vision?

It is built and maintained by Puma1981 (@puma1981); the current version is v1.0.0.

More Skills