Description

让没有原生 vision 能力的模型获得识图能力。当用户发送图片、分享图片路径、或要求分析/描述/识别图片内容时，必须使用此 skill。触发场景（必须使用）：用户说"看这张图"、"帮我识别这个图片"、"描述一下这张图"、"分析这个截图"、"比较这些图片"、发送图片文件路径、消息中出现图片附件、或要求识别图片中...

README (SKILL.md)

Vision Skill

Name: Vision
Author: guorui999

让没有原生识图能力的模型（如 DeepSeek）也能"看图"——通过调用外部视觉 API 获取图片的文字描述。

快速配置

node scripts/vision.js --setup

按提示输入 API Key、API 地址、模型名称。

查看当前配置

node scripts/vision.js --config

使用方法

自动触发（推荐）

当用户发送图片或要求分析图片时，自动调用：

node scripts/vision.js "\x3C图片路径>" "用中文描述这张图片"

单张图片

# 本地图片
node scripts/vision.js /path/to/image.jpg "描述图片内容"

# 网络图片
node scripts/vision.js --url https://example.com/image.png "这是什么？"

多张图片

# 多张本地图片
node scripts/vision.js image1.jpg image2.jpg image3.jpg "比较这些图片的异同"

# 混合本地和网络图片
node scripts/vision.js local.jpg --url https://example.com/online.png "这两张图有什么关系？"

支持的图片格式

jpg, jpeg, png, gif, webp, bmp

支持的视觉服务

服务	模型	备注
阿里云百炼（推荐）	`qwen3.5-omni-plus`	新用户 100 万 token 免费
阿里云百炼	`qwen-vl-max`	同上
OpenAI	`gpt-4o-mini`	需海外支付
其他	任何 OpenAI 兼容格式	改 `BASE_URL` 和模型名即可

配置文件说明

配置文件：~/.claude/skills/vision/config.json

{
  "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
  "api_key": "你的API Key",
  "model": "qwen3.5-omni-plus"
}

工作原理

读取图片文件 → 转换为 base64
调用视觉 API（OpenAI 兼容格式）
返回文字描述

注意事项

需要 Node.js 环境
首次使用需配置 API Key
网络图片需要能访问对应 URL

Usage Guidance

Only install this after confirming you are comfortable with images, URLs, and prompts being sent to the configured third-party vision provider. Avoid using it on IDs, screenshots with secrets, medical or financial documents, or internal company materials unless you have explicit approval. Review the install source carefully and prefer a pinned trusted URL or checksum-verified package; protect or rotate the API key if the skill directory may be shared, backed up, or committed.

Capability Tags

requires-sensitive-credentials

Capability Assessment

⚠ Purpose & Capability

Image recognition through a remote vision API is coherent with the stated purpose, but the artifacts described by the scan include processing highly sensitive identity-document images without clear privacy safeguards, minimization, or consent handling.

⚠ Instruction Scope

The trigger examples appear broad enough to activate on general image-help requests, and the runtime flow does not clearly require confirmation before sending local images, URLs, and prompts to an external provider.

⚠ Install Mechanism

The installer reportedly accepts a user-supplied source URL, downloads skill files, and marks JavaScript executable without origin validation, integrity checking, or a confirmation step; that is overbroad for a vision helper.

⚠ Credentials

Networked image analysis and an API key are proportionate for this kind of skill, but the lack of prominent disclosure around third-party transmission and sensitive-image handling makes the environment access under-scoped.

⚠ Persistence & Privilege

The setup flow stores an API key in a plaintext local config file. No background persistence or privilege escalation is indicated, but credential persistence should be clearly disclosed and protected.

Version History

v0.1.0

vision-2 0.1.0 初始发布 - 为无原生视觉能力的模型（如 DeepSeek）提供自动识图功能，支持图片内容分析、描述与对比。 - 自动识别图片来源（本地/网络），支持多图模式。 - 集成多种视觉服务（含阿里云百炼、OpenAI 及兼容模型），支持 jpg/png/gif/webp/bmp 等主流格式。 - 命令行配置简单，首次使用引导设置。 - 详细文档覆盖触发场景、不适用场景及 API 配置说明。

Metadata

Slug vision-2

Version 0.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Vision?

让没有原生 vision 能力的模型获得识图能力。当用户发送图片、分享图片路径、或要求分析/描述/识别图片内容时，必须使用此 skill。触发场景（必须使用）：用户说"看这张图"、"帮我识别这个图片"、"描述一下这张图"、"分析这个截图"、"比较这些图片"、发送图片文件路径、消息中出现图片附件、或要求识别图片中... It is an AI Agent Skill for Claude Code / OpenClaw, with 41 downloads so far.

How do I install Vision?

Run "/install vision-2" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Vision free?

Yes, Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Vision support?

Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Vision?

It is built and maintained by guorui999 (@guorui999); the current version is v0.1.0.

More Skills

Vision