← 返回 Skills 市场

Vision Helper — AI Image Analysis

Name: Vision Helper — AI Image Analysis
Author: ravenquasar

作者 U3UT7 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install vision-helper

功能描述

Analyze images using local or cloud vision models via Ollama to identify content, UI elements, screenshots, or extract text with OCR support.

安全使用建议

This skill appears to be what it claims: a helper that reads an image file and sends it to an Ollama instance for analysis. Before installing or using it, consider the following: - Privacy: The script will read any readable file with an allowed extension and base64-encode it. If you take desktop/browser screenshots you may capture passwords, private chats, or other sensitive data. - Endpoint trust: By default the script posts to http://localhost:11434/api/chat. If you change OLLAMA_API_URL to a remote URL, those images (and any textual prompt) will be transmitted to that remote service. Only point it to endpoints you trust. - File validation is extension-based and the path-traversal check is simplistic ('..' substring). Don't feed files you don't trust; avoid symlink/renamed files containing sensitive content. - Automation caution: The README suggests using model output to drive clicks or inputs; make sure any automation steps are safe and tested before running with real privileges or on critical systems. Practical steps: run a local Ollama instance and keep OLLAMA_API_URL at its default if you want privacy; inspect or run the included script in a sandbox first; avoid passing images containing secrets; and do not set OLLAMA_API_URL to an external service unless you control or trust it.

功能分析

Type: OpenClaw Skill Name: vision-helper Version: 1.0.0 The vision-helper skill is a utility designed to analyze images via Ollama, specifically addressing timeout limitations in built-in tools. The core script (scripts/analyze_image.py) is well-structured, using standard Python libraries and implementing security best practices such as path traversal checks and file extension validation. No evidence of data exfiltration, malicious execution, or obfuscation was found; the skill functions transparently as a wrapper for vision model APIs.

能力评估

✓ Purpose & Capability

Name/description match the implementation: the included Python script encodes an image and calls an Ollama chat API with a vision model. The script supports model selection and extended timeout as advertised.

ℹ Instruction Scope

SKILL.md explicitly instructs using exec to take and analyze screenshots (browser, desktop tools) and to 'act' on analysis results (clicks/input). That is within the skill's stated automation use-cases, but it carries privacy and automation-safety implications (desktop screenshots may contain sensitive data; automated actions driven by model output can have undesired effects).

✓ Install Mechanism

Instruction-only skill with no install spec; included script is plain Python and there are no downloads or external installers. This is a low-risk install surface.

ℹ Credentials

The registry metadata lists no required env vars, but SKILL.md and the script use optional env vars (OLLAMA_API_URL, VISION_MODEL, VISION_TIMEOUT). Defaults point to localhost, which is reasonable, but changing OLLAMA_API_URL to a remote endpoint would send base64-encoded images off-host. The env usage is proportionate to functionality but carries obvious exfiltration/privacy risks if pointed at an untrusted service. Also, the script enforces allowed extensions by filename only (and a simple '..' check), which could be abused if non-image data is disguised with an allowed extension.

✓ Persistence & Privilege

always is false and the skill does not request ongoing system presence or modify other skills. It runs on-demand via exec and does not request elevated privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install vision-helper
安装完成后，直接呼叫该 Skill 的名称或使用 /vision-helper 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of Vision Helper, an image analysis skill using local or cloud vision models via Ollama. - Supports analyzing images, UI elements, screenshots, and performing OCR with extended timeout for cloud models (up to 180 seconds). - Bypasses built-in image tool limitations, including path restrictions and short timeouts. - Provides CLI and conversational usage examples, including workflows for browser, desktop, and game UI screenshots. - Allows easy switching between multiple supported local and cloud vision models via environment variables. - Supports various image formats and directory paths for flexible screenshot handling.

元数据

Slug vision-helper

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Vision Helper — AI Image Analysis 是什么？

Analyze images using local or cloud vision models via Ollama to identify content, UI elements, screenshots, or extract text with OCR support. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 77 次。

如何安装 Vision Helper — AI Image Analysis？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install vision-helper」即可一键安装，无需额外配置。

Vision Helper — AI Image Analysis 是免费的吗？

是的，Vision Helper — AI Image Analysis 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Vision Helper — AI Image Analysis 支持哪些平台？

Vision Helper — AI Image Analysis 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Vision Helper — AI Image Analysis？

由 U3UT7（@ravenquasar）开发并维护，当前版本 v1.0.0。