功能描述

让没有原生 vision 能力的模型获得识图能力。当用户发送图片、分享图片路径、或要求分析/描述/识别图片内容时，必须使用此 skill。触发场景（必须使用）：用户说"看这张图"、"帮我识别这个图片"、"描述一下这张图"、"分析这个截图"、"比较这些图片"、发送图片文件路径、消息中出现图片附件、或要求识别图片中...

使用说明 (SKILL.md)

Vision Skill

Name: Vision
Author: guorui999

让没有原生识图能力的模型（如 DeepSeek）也能"看图"——通过调用外部视觉 API 获取图片的文字描述。

快速配置

node scripts/vision.js --setup

按提示输入 API Key、API 地址、模型名称。

查看当前配置

node scripts/vision.js --config

使用方法

自动触发（推荐）

当用户发送图片或要求分析图片时，自动调用：

node scripts/vision.js "\x3C图片路径>" "用中文描述这张图片"

单张图片

# 本地图片
node scripts/vision.js /path/to/image.jpg "描述图片内容"

# 网络图片
node scripts/vision.js --url https://example.com/image.png "这是什么？"

多张图片

# 多张本地图片
node scripts/vision.js image1.jpg image2.jpg image3.jpg "比较这些图片的异同"

# 混合本地和网络图片
node scripts/vision.js local.jpg --url https://example.com/online.png "这两张图有什么关系？"

支持的图片格式

jpg, jpeg, png, gif, webp, bmp

支持的视觉服务

服务	模型	备注
阿里云百炼（推荐）	`qwen3.5-omni-plus`	新用户 100 万 token 免费
阿里云百炼	`qwen-vl-max`	同上
OpenAI	`gpt-4o-mini`	需海外支付
其他	任何 OpenAI 兼容格式	改 `BASE_URL` 和模型名即可

配置文件说明

配置文件：~/.claude/skills/vision/config.json

{
  "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
  "api_key": "你的API Key",
  "model": "qwen3.5-omni-plus"
}

工作原理

读取图片文件 → 转换为 base64
调用视觉 API（OpenAI 兼容格式）
返回文字描述

注意事项

需要 Node.js 环境
首次使用需配置 API Key
网络图片需要能访问对应 URL

安全使用建议

Only install this after confirming you are comfortable with images, URLs, and prompts being sent to the configured third-party vision provider. Avoid using it on IDs, screenshots with secrets, medical or financial documents, or internal company materials unless you have explicit approval. Review the install source carefully and prefer a pinned trusted URL or checksum-verified package; protect or rotate the API key if the skill directory may be shared, backed up, or committed.

能力标签

requires-sensitive-credentials

能力评估

⚠ Purpose & Capability

Image recognition through a remote vision API is coherent with the stated purpose, but the artifacts described by the scan include processing highly sensitive identity-document images without clear privacy safeguards, minimization, or consent handling.

⚠ Instruction Scope

The trigger examples appear broad enough to activate on general image-help requests, and the runtime flow does not clearly require confirmation before sending local images, URLs, and prompts to an external provider.

⚠ Install Mechanism

The installer reportedly accepts a user-supplied source URL, downloads skill files, and marks JavaScript executable without origin validation, integrity checking, or a confirmation step; that is overbroad for a vision helper.

⚠ Credentials

Networked image analysis and an API key are proportionate for this kind of skill, but the lack of prominent disclosure around third-party transmission and sensitive-image handling makes the environment access under-scoped.

⚠ Persistence & Privilege

The setup flow stores an API key in a plaintext local config file. No background persistence or privilege escalation is indicated, but credential persistence should be clearly disclosed and protected.

版本历史

v0.1.0

vision-2 0.1.0 初始发布 - 为无原生视觉能力的模型（如 DeepSeek）提供自动识图功能，支持图片内容分析、描述与对比。 - 自动识别图片来源（本地/网络），支持多图模式。 - 集成多种视觉服务（含阿里云百炼、OpenAI 及兼容模型），支持 jpg/png/gif/webp/bmp 等主流格式。 - 命令行配置简单，首次使用引导设置。 - 详细文档覆盖触发场景、不适用场景及 API 配置说明。

元数据

Slug vision-2

版本 0.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Vision 是什么？

让没有原生 vision 能力的模型获得识图能力。当用户发送图片、分享图片路径、或要求分析/描述/识别图片内容时，必须使用此 skill。触发场景（必须使用）：用户说"看这张图"、"帮我识别这个图片"、"描述一下这张图"、"分析这个截图"、"比较这些图片"、发送图片文件路径、消息中出现图片附件、或要求识别图片中... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 41 次。

如何安装 Vision？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install vision-2」即可一键安装，无需额外配置。

Vision 是免费的吗？

是的，Vision 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Vision 支持哪些平台？

Vision 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Vision？

由 guorui999（@guorui999）开发并维护，当前版本 v0.1.0。

Vision