← 返回 Skills 市场
584
总下载
1
收藏
1
当前安装
2
版本数
在 OpenClaw 中安装
/install glm-v-model
功能描述
智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到:图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。
使用说明 (SKILL.md)
GLM 视觉模型调用
本技能提供调用智谱 AI 的 GLM-4V 和 GLM-4.6V 视觉模型的能力,支持图像理解、视频分析、图表解读等功能。
支持的模型
| 模型 | 说明 | 特点 |
|---|---|---|
| glm-4v | GLM-4 视觉模型 | 基础视觉理解 |
| glm-4.6v | GLM-4.6V 视觉模型 | 更强的视觉理解能力,支持更长上下文 |
快速使用
基本图像理解
from zai import ZhipuAiClient
import base64
client = ZhipuAiClient(api_key="YOUR_API_KEY")
# 读取本地图片并转为 base64
with open("image.jpg", "rb") as f:
img_base = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="glm-4.6v",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_base}"}},
{"type": "text", "content": "描述这张图片"}
]
}],
thinking={"type": "enabled"}
)
print(response.choices[0].message.content)
使用图片URL
response = client.chat.completions.create(
model="glm-4.6v",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
{"type": "text", "content": "这张图片里有什么?"}
]
}]
)
多图理解
response = client.chat.completions.create(
model="glm-4.6v",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "图片1 base64 或 URL"}},
{"type": "image_url", "image_url": {"url": "图片2 base64 或 URL"}},
{"type": "text", "content": "比较这两张图片的异同"}
]
}]
)
视频理解(GLM-4.6V)
# 支持理解视频内容
response = client.chat.completions.create(
model="glm-4.6v",
messages=[{
"role": "user",
"content": [
{"type": "video_url", "video_url": {"url": "视频URL"}},
{"type": "text", "content": "描述这个视频的内容"}
]
}]
)
使用脚本
项目中已包含脚本 script/infer_glmv.py,可直接调用:
import sys
sys.path.append('/Users/guobaokui/.openclaw/workspace_multmodal/skills/glm-v-model/script')
from infer_glmv import glm_v
# 使用方式
# glm_v(['image.jpg'], '描述图片', 'glm-4.6v')
常用场景
| 场景 | Prompt 示例 |
|---|---|
| 图片描述 | "详细描述这张图片的内容" |
| 图表分析 | "分析这张图表数据" |
| 文字识别(OCR) | "提取图片中的文字" |
| 物体识别 | "图片中有哪些物体" |
| 场景理解 | "这是什么地方" |
| 多图对比 | "比较这两张图片的异同" |
| 视频理解 | "总结这个视频的内容" |
注意事项
- API Key: 需要智谱 AI 的 API Key,可从 https://open.bigmodel.cn 获取
- 图片格式: 支持 JPEG、PNG、WebP 等常见格式
- 图片大小: 单张图片建议不超过 10MB
- thinking: 可启用深度思考模式
thinking={"type": "enabled"} - 计费: 按 token 计费,图片会转换为 token 消耗
安全使用建议
This skill appears to do what it claims (call Zhipu GLM visual models), but there are several things to check before installing or using it:
- Expect that the skill needs an API key (ZHIPU_API_KEY) even though the registry metadata doesn't list it. Provide the key only if you trust the Zhipu/bigmodel.cn service and understand their data handling.
- Images (and possibly video) will be transmitted to a third-party API. Do not send sensitive or private images unless you are comfortable with that provider's privacy/retention policy.
- The included helper script and examples contain issues: a hardcoded user path in an example, and a likely bug where the script calls img.read() but SKILL.md suggests passing filenames. Treat the script as untrusted code and inspect/modify it before running.
- The SDK (zai-sdk) is installed via pip per the comments. Review the package source/version (e.g., on PyPI or the vendor site) before installing to ensure it's legitimate.
Recommended actions: ask the publisher to update the registry metadata to list ZHIPU_API_KEY (and any other required env vars), remove or fix hardcoded paths/examples, and correct the script's file-handling behavior. If you cannot verify the publisher/SDK, avoid sending private images or run the code in an isolated environment.
功能分析
Type: OpenClaw Skill
Name: glm-v-model
Version: 1.0.1
The skill bundle is a legitimate integration for Zhipu AI's GLM-4V/4.6V vision models. The Python script `script/infer_glmv.py` correctly uses environment variables for API keys and performs standard base64 encoding for image processing. While `SKILL.md` contains a hardcoded local developer path (/Users/guobaokui/...), this is a common non-malicious artifact of development, and no evidence of data exfiltration or prompt injection was found.
能力评估
Purpose & Capability
The name, description, SKILL.md examples, and the Python helper all target calling Zhipu/GLM-4V/4.6V visual models (image/video understanding). Requiring an API key to call an external model provider is expected. However, the registry metadata declares no required environment variables while both the SKILL.md and the script state an API key is needed (ZHIPU_API_KEY). This mismatch between declared requirements and actual use is a discrepancy to resolve.
Instruction Scope
Instructions direct the agent to read local image files or URLs and send them to the GLM service, which matches the stated functionality. Concerns: (1) SKILL.md contains an example that appends an absolute, user-specific path (/Users/guobaokui/...) to sys.path — this is an unsafe, non-portable example and unnecessary. (2) The provided script's expected input is ambiguous/buggy: it expects objects with .read() for local images but the SKILL.md example calls glm_v(['image.jpg'], ...) (a filename string), which will break. (3) The skill will transmit image data to a third-party API (Zhipu); that is expected but privacy-sensitive.
Install Mechanism
No install spec is included (instruction-only plus a helper script). Comments suggest installing the 'zai-sdk' via pip — a normal, low-risk package manager step. No downloads from arbitrary URLs or extract steps are present.
Credentials
The code reads ZHIPU_API_KEY from the environment to authenticate to the external service, which is proportionate to the skill's purpose. The concern is that the skill's registry metadata does not list this required environment variable (or any primary credential). The missing declaration is misleading and could cause users to overlook the need to provide credentials and to recognize that data will be sent to a third party.
Persistence & Privilege
The skill is not marked always:true, does not request system-wide config paths, and does not modify other skills. It runs as an invoked skill and requires an external API key — no excessive persistence or elevated privileges are requested.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install glm-v-model - 安装完成后,直接呼叫该 Skill 的名称或使用
/glm-v-model触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
No user-visible changes detected in this version.
v1.0.0
Initial release with GLM-4V/GLM-4.6V vision model integration.
- 支持智谱 GLM-4V/4.6V 视觉模型,用于图像和视频理解、多模态对话、图表分析等任务
- 提供图片本地/base64、图片URL、多图、视频等多场景用法示例
- 附带脚本(infer_glmv.py)便于项目集成调用
- 详细列举常用场景及示例 Prompt
- 说明 API Key 获取、支持格式、文件大小及计费注意事项
元数据
常见问题
glm-v-model 是什么?
智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到:图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 584 次。
如何安装 glm-v-model?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install glm-v-model」即可一键安装,无需额外配置。
glm-v-model 是免费的吗?
是的,glm-v-model 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
glm-v-model 支持哪些平台?
glm-v-model 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 glm-v-model?
由 baokui(@baokui)开发并维护,当前版本 v1.0.1。
推荐 Skills