← Back to Skills Marketplace
puma1981

Claw Vision

by Puma1981 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
561
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install claw-vision
Description
Analyze local images including screenshots, receipts, and documents to extract structured text, UI elements, and provide content summaries with confidence le...
README (SKILL.md)

能力定位

本地图片路径 → 结构化文本理解。通过 vision-tool.py 调用 Gemini 3.1 Pro Preview(NUWA Flux)。

触发场景

  • 用户发送截图、照片、图片文件
  • 关键词:截图、图片里有什么、识别、screenshot、describe image

调用方式

python3 ~/Documents/OpenClaw/workspace/scripts/vision-tool.py \x3C图片绝对路径> "\x3C提示语>"

参数

参数 必填 默认值
图片路径
提示语 "图片里有什么?"

支持格式

PNG / JPG / JPEG / GIF / WEBP(仅本地文件,不支持URL)

输出规范

[summary]     图片内容概述
[fields]      关键字段提取(含文字/表格时)
[ui_elements] 界面元素列表(UI截图时)
[confidence]   置信度: 高/中/低

依赖

  • vision-tool.py: ~/Documents/OpenClaw/workspace/scripts/vision-tool.py
  • API: NUWA Flux gemini-3.1-pro-preview
Usage Guidance
Do not run or give this skill access until you verify the helper script. Ask the author to provide the vision-tool.py source or bundle it with the skill so it can be reviewed. Confirm how NUWA/Gemini credentials are stored and ensure they are not hardcoded in an opaque script. If you must test, inspect the script manually or run it inside a restricted sandbox (container) to prevent unintended file access or network exfiltration. Prefer skills that declare required env vars and include or link to verifiable code or an install step from a trusted release URL.
Capability Analysis
Type: OpenClaw Skill Name: claw-vision Version: 1.0.0 The skill bundle defines an execution pattern in SKILL.md that is vulnerable to shell injection by instructing the agent to pass user-controlled strings directly into a shell command (python3 ... "<提示语>"). It also references a fictional model version (Gemini 3.1 Pro) and relies on an external script (vision-tool.py) located in the user's home directory rather than including it in the bundle. While no clear malicious intent or exfiltration logic is present, the insecure command construction and external dependencies are high-risk indicators.
Capability Assessment
Purpose & Capability
The declared purpose (analyze local images) matches the instruction to run a local vision tool. However, the skill depends on a hardcoded user-local script path (~/Documents/OpenClaw/workspace/scripts/vision-tool.py) and references the NUWA/Gemini API without declaring how authentication should be provided. Requiring an arbitrary local script at that path is unusual and not justified in the SKILL.md.
Instruction Scope
SKILL.md instructs the agent to run a user-local Python script with an arbitrary image path and prompt. Because the script is not included, its behavior is unknown — it could read any files under the user's home, access network endpoints, or perform other actions. The instructions do not place limits on what the script may do or where credentials come from.
Install Mechanism
There is no install spec and no code files in the skill bundle (instruction-only), which minimizes risk from untrusted downloads. However, the lack of an included script means the agent will rely on an external file that cannot be analyzed.
Credentials
The SKILL.md references NUWA Flux / gemini-3.1-pro-preview but declares no required environment variables or primary credential. It's unclear how the vision-tool.py authenticates to the external API (missing API key/env guidance). This mismatch is a red flag: either credentials are expected to exist elsewhere on the system or the script will prompt/handle them — both are security-relevant behaviors that should be declared.
Persistence & Privilege
The skill does not request persistent or always-on privileges (always:false). Nevertheless, it instructs execution of a local script which can run arbitrary code when invoked; that is a runtime privilege but not a declared persistent capability. No modifications to other skills or system-wide settings are specified.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install claw-vision
  3. After installation, invoke the skill by name or use /claw-vision
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
claw-vision 1.0.0 - Initial release. - Analyze images using Gemini 3.1 Pro Preview (NUWA Flux) via vision-tool.py. - Supports local PNG, JPG, JPEG, GIF, and WEBP files. - Outputs structured results: summary, key fields, UI elements, and confidence level. - Triggered when users send image files or relevant keywords/commands.
Metadata
Slug claw-vision
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Claw Vision?

Analyze local images including screenshots, receipts, and documents to extract structured text, UI elements, and provide content summaries with confidence le... It is an AI Agent Skill for Claude Code / OpenClaw, with 561 downloads so far.

How do I install Claw Vision?

Run "/install claw-vision" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Claw Vision free?

Yes, Claw Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Claw Vision support?

Claw Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Claw Vision?

It is built and maintained by Puma1981 (@puma1981); the current version is v1.0.0.

💬 Comments