← Back to Skills Marketplace
thincher

glm-understand-image

by 要啥自行车 · GitHub ↗ · v1.0.4
cross-platform ⚠ suspicious
1041
Downloads
0
Stars
2
Active Installs
5
Versions
Install in OpenClaw
/install glm-understand-image
Description
使用 GLM 视觉 MCP 进行图像理解和分析。触发条件:(1) 用户要求分析图片、理解图像、描述图片内容 (2) 需要识别图片中的物体、文字、场景 (3) 使用 GLM 的视觉理解功能
README (SKILL.md)

glm-understand-image

使用 GLM 视觉 MCP 服务器进行图像理解和分析。

执行流程(首次需要安装,后续直接步骤6调用)

步骤 1: 检查并安装依赖

1.1 检查 mcporter 是否可用

npx -y mcporter --version

如果命令返回成功,说明 mcporter 可用,跳到步骤 2。

mcporter 可以直接通过 npx 使用,无需安装。

步骤 2: 检查 API Key 配置

cat ~/.openclaw/config/glm.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('api_key', ''))"

如果返回非空的 API Key,跳到步骤 4。

步骤 3: 配置 API Key(如果未配置)

3.2 如果没有找到 Key,向用户索要

询问用户提供智谱 API Key。

如果用户没有智谱 API Key,可以访问 https://www.bigmodel.cn/glm-coding?ic=OOKF4KGGTW 购买。

3.3 保存 API Key

mkdir -p ~/.openclaw/config
cat > ~/.openclaw/config/glm.json \x3C\x3C EOF
{
  "api_key": "API密钥"
}
EOF

步骤 4: 添加 MCP 服务器

使用 mcporter 添加 GLM 视觉 MCP 服务器:

mcporter config add glm-vision \
  --command "npx -y @z_ai/mcp-server" \
  --env Z_AI_API_KEY="your-key" \
  --env Z_AI_MODE="ZHIPU" \
  --env HOME="$PWD"

注意:将 your-key 替换为实际的智谱 API Key。HOME 环境变量设置为当前工作目录以避免日志文件权限问题。

步骤 5: 测试连接

mcporter list

确认 glm-vision 服务器已成功添加。

步骤 6: 使用 MCP 处理图像

6.1 准备图片

将图片放到可访问路径,例如:

  • ~/.openclaw/workspace/images/图片名.jpg
  • 或者使用 URL

6.2 使用 mcporter 调用 MCP 工具

使用 mcporter 调用 MCP 服务:

mcporter call glm-vision.analyze_image prompt="\x3C对图片的提问>" image_source="\x3C图片路径或URL>"

示例:

# 描述图片内容
mcporter call glm-vision.analyze_image prompt="详细描述这张图片的内容" image_source="~/image.jpg"

# 使用 URL
mcporter call glm-vision.analyze_image prompt="这张图片展示了什么?" image_source="https://example.com/image.jpg"

# 提取图片中的文字
mcporter call glm-vision.extract_text_from_screenshot image_source="~/screenshot.png"

# 诊断错误截图
mcporter call glm-vision.diagnose_error_screenshot prompt="分析这个错误" image_source="~/error.png"

6.3 API 参数说明

参数 说明 类型
image_source 图片路径或 URL string (必填)
prompt 对图片的提问 string (必填)

支持的工具

重要提示:如果出现问题以官方说明为准 官方版说明 : https://docs.bigmodel.cn/cn/coding-plan/mcp/vision-mcp-server

GLM 视觉 MCP 服务器提供以下工具:

  • ui_to_artifact - 将 UI 截图转换为代码、提示词、设计规范或自然语言描述
  • extract_text_from_screenshot - 使用先进的 OCR 能力从截图中提取和识别文字
  • diagnose_error_screenshot - 解析错误弹窗、堆栈和日志截图,给出定位与修复建议
  • understand_technical_diagram - 针对架构图、流程图、UML、ER 图等技术图纸生成结构化解读
  • analyze_data_visualization - 阅读仪表盘、统计图表,提炼趋势、异常与业务要点
  • ui_diff_check - 对比两张 UI 截图,识别视觉差异和实现偏差
  • analyze_image - 通用图像理解能力,适配未被专项工具覆盖的视觉内容
  • video_analysis - 支持 MP4/MOV/M4V 等格式的视频场景解析,抓取关键帧、事件与要点

MCP 配置

MCP 服务器名称:glm-vision

MCP 服务器配置:@z_ai/mcp-server

环境变量:

  • Z_AI_API_KEY - 智谱 API Key(必需)
  • Z_AI_MODE - 服务平台选择,默认为 ZHIPU
Usage Guidance
This skill appears to do what it says (configure an MCP server and call analyze_image), but exercise caution before installing: (1) npx -y @z_ai/mcp-server will fetch and execute code from npm at runtime — verify you trust that package and its publisher; (2) the SKILL.md reads/writes ~/.openclaw/config/glm.json and advises storing your Zhipu API key in plaintext there — consider storing credentials securely (e.g., a secrets manager) instead of an unprotected file; (3) the metadata did not declare the config path or env var usage — ask the author to declare required config/envs and to explain why HOME is set to the current directory when adding the MCP; (4) if you cannot verify the upstream package or the author, avoid running the npx commands and prefer an explicit, vetted installation path.
Capability Analysis
Type: OpenClaw Skill Name: glm-understand-image Version: 1.0.4 The skill is highly suspicious due to critical shell injection vulnerabilities present in `SKILL.md`. The instructions for the AI agent involve embedding user-provided input (API keys, image paths/URLs, prompts) directly into shell commands without any explicit sanitization or quoting. This creates a severe Remote Code Execution (RCE) risk, as a malicious user could craft inputs containing shell metacharacters to execute arbitrary commands on the host system, particularly in the `cat > ~/.openclaw/config/glm.json`, `mcporter config add`, and `mcporter call` commands.
Capability Assessment
Purpose & Capability
The skill claims to provide GLM-based image understanding and its runtime steps (using mcporter to run a GLM MCP server and calling analyze_image) align with that purpose. It requests a Zhipu/智谱 API key which is appropriate for using that service. However, the skill metadata declared no required config paths or env vars while the instructions explicitly read and write ~/.openclaw/config/glm.json and expect a Z_AI_API_KEY — this mismatch is a design inconsistency.
Instruction Scope
The SKILL.md tells the agent to read ~/.openclaw/config/glm.json and to store the API key there in plaintext; that file path is not declared in the registry metadata. It also instructs running npx to fetch and run @z_ai/mcp-server at runtime. Writing credentials to a plaintext file and running remotely-fetched code are scope-relevant but risk-bearing behaviors that should be explicitly declared and justified.
Install Mechanism
No install spec is included (instruction-only), but the instructions rely on npx -y to fetch and run @z_ai/mcp-server. Using npx means arbitrary package code from the npm registry will be executed at runtime; this is expected for an instruction-only skill that delegates to an MCP server but increases runtime risk compared with a pre-vetted binary or an official release URL.
Credentials
The skill needs a Zhipu API key (Z_AI_API_KEY) to function, which is proportionate. However, the registry lists no required env vars while the SKILL.md both reads/writes ~/.openclaw/config/glm.json and sets Z_AI_API_KEY when configuring mcporter. Storing the API key as plaintext in ~/.openclaw/config/glm.json is insecure and the discrepancy between declared requirements and actual instructions is inconsistent.
Persistence & Privilege
always is false and the skill is user-invocable only; it does not request elevated or persistent platform privileges. Autonomous invocation is allowed by default (disable-model-invocation is false) but that is the platform default and is not, by itself, an extra privilege concern here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install glm-understand-image
  3. After installation, invoke the skill by name or use /glm-understand-image
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.4
- 移除了自动检测和读取已有的 API Key 文件(auth-profiles.json),简化 API Key 配置流程。 - 步骤 3 现在只提供直接向用户索要智谱 API Key 的说明,并保留手动填写和保存方式。 - 其余流程和功能保持不变。
v1.0.3
- 更新 API Key 自动发现逻辑,优先从 ~/.openclaw/agents/main/agent/auth-profiles.json 获取而非之前的路径 - 精简安装说明,首次安装后可直接跳过前置步骤,后续仅需执行第6步 - 明确了无需再根据 API Key 格式判断来源,仅依据名称包含 "zhipu" 或 "zai" 进行筛选 - 其余功能与支持的工具未变,使用流程保持一致
v1.0.2
- 改进 MCP 服务器添加流程,增加设置 HOME 环境变量以避免日志权限问题。 - 更新图像分析工具名称与参数,`image_analysis` 改为 `analyze_image`,参数由 `image` 调整为 `image_source`。 - 所有相关调用示例与参数说明同步更新为新接口和参数名。 - 工具列表中相应名称做出调整,确保指引与实际一致。
v1.0.1
- 文档样式微调,删除“完整流程示例”部分,简化说明结构。 - 在“支持的工具”节前加入“重要提示”,建议出现问题以官方文档为准。 - 其余内容(流程、命令、接口参数等)保持不变。
v1.0.0
- Migrated from MiniMax to the GLM 视觉 MCP, with corresponding toolkit and configuration changes. - Replaced uvx and Python scripts with npx/mcporter usage for setup and image analysis. - Updated setup workflow: now utilizes mcporter for MCP server management and API calls. - All references to MiniMax and its API keys have been swapped for 智谱 (GLM) and its API keys. - Expanded and clarified supported visual analysis tools available via GLM 视觉 MCP. - Removed the original understand_image.py script; all usage now relies on mcporter commands.
Metadata
Slug glm-understand-image
Version 1.0.4
License
All-time Installs 2
Active Installs 2
Total Versions 5
Frequently Asked Questions

What is glm-understand-image?

使用 GLM 视觉 MCP 进行图像理解和分析。触发条件:(1) 用户要求分析图片、理解图像、描述图片内容 (2) 需要识别图片中的物体、文字、场景 (3) 使用 GLM 的视觉理解功能. It is an AI Agent Skill for Claude Code / OpenClaw, with 1041 downloads so far.

How do I install glm-understand-image?

Run "/install glm-understand-image" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is glm-understand-image free?

Yes, glm-understand-image is completely free (open-source). You can download, install and use it at no cost.

Which platforms does glm-understand-image support?

glm-understand-image is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created glm-understand-image?

It is built and maintained by 要啥自行车 (@thincher); the current version is v1.0.4.

💬 Comments