← 返回 Skills 市场

GLM Multimodal Analyzer

Name: GLM Multimodal Analyzer
Author: tridefender

作者 TriDefender · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

576

总下载

当前安装

版本数

在 OpenClaw 中安装

/install multimodal

功能描述

使用GLM-4.6V模型进行多模态内容理解（图片、视频、文档）

安全使用建议

This skill will read local files (images, videos, PDFs), base64-encode them, and send their contents to https://open.bigmodel.cn using a ZHIPU_API_KEY. Before installing: (1) Confirm the skill metadata is corrected to declare ZHIPU_API_KEY; (2) Verify you trust the remote endpoint and the publisher — the Homepage and source are unknown; (3) Do not feed sensitive or private files (passwords, keys, proprietary docs) to the skill; (4) Consider using an ephemeral or scoped API key and audit API usage; (5) If you need higher assurance, request the publisher provide provenance (source repo, signatures) or review the code yourself — the relevant behavior is visible in scripts/analyze.py. If you accept these privacy risks and trust the endpoint, the functionality is coherent; if not, do not install or run with sensitive inputs.

功能分析

Type: OpenClaw Skill Name: multimodal Version: 1.0.0 The skill bundle contains a command injection vulnerability in agent.json within the toolHandlers section, where user-provided parameters (input and prompt) are wrapped in single quotes and passed directly to a shell command. This allows an attacker to escape the quotes and execute arbitrary commands on the host system. While the Python script scripts/analyze.py appears to be a legitimate tool for interacting with the Zhipu AI API (open.bigmodel.cn), the insecure handling of shell execution makes the bundle high-risk.

能力评估

⚠ Purpose & Capability

The skill's purpose (multimodal analysis via GLM-4.6V) matches the code and agent configuration. However the registry metadata lists no required env vars while SKILL.md and scripts/analyze.py require ZHIPU_API_KEY — an inconsistency in declared requirements. Minor model naming/context inconsistencies (SKILL.md: GLM-4.6V 128K, agent.json/model: 'zai/glm-4.6v-flash', script MODEL='glm-4.6v', MAX_TOKENS=4096) are also present.

⚠ Instruction Scope

SKILL.md and analyze.py allow local file paths and will base64-encode entire local files and include them in requests to https://open.bigmodel.cn/api/paas/v4/chat/completions. That behavior is coherent with a multimodal uploader, but it means arbitrary local files (including sensitive documents) may be exfiltrated to the remote API without additional safeguards or filtering.

ℹ Install Mechanism

This is an instruction-only skill with no install spec (lowest install risk). README mentions requests will be auto-installed but there is no formal install step; the script exits if requests is missing. No external downloads or packaged installers are used.

⚠ Credentials

The runtime requires a single secret ZHIPU_API_KEY (used as a Bearer token) which is proportionate to calling a third-party API. The problem is that the registry metadata did not declare this requirement — the skill should have listed ZHIPU_API_KEY as required.env. Requiring an API key for the claimed purpose is expected, but the omission in metadata and the ability to send arbitrary local files increases risk.

✓ Persistence & Privilege

The skill does not request always:true, does not declare system config paths, and does not modify other skills. It is user-invocable and can be invoked autonomously per platform default (not flagged here).

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install multimodal
安装完成后，直接呼叫该 Skill 的名称或使用 /multimodal 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Multimodal Analyzer 1.0.0 - Initial release with GLM-4.6V-powered multimodal content understanding. - Supports image OCR, scene and object analysis, video summarization & keyframe extraction, and document (PDF/table) parsing. - Includes a deep thinking mode for advanced reasoning. - Command-line interface for content analysis via script. - Currently processes one modality at a time; requires publicly accessible URLs for videos.

元数据

Slug multimodal

版本 1.0.0

许可证 —

累计安装 3

当前安装数 3

历史版本数 1

常见问题

GLM Multimodal Analyzer 是什么？

使用GLM-4.6V模型进行多模态内容理解（图片、视频、文档）. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 576 次。

如何安装 GLM Multimodal Analyzer？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install multimodal」即可一键安装，无需额外配置。

GLM Multimodal Analyzer 是免费的吗？

是的，GLM Multimodal Analyzer 完全免费（开源免费），可自由下载、安装和使用。

GLM Multimodal Analyzer 支持哪些平台？

GLM Multimodal Analyzer 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 GLM Multimodal Analyzer？

由 TriDefender（@tridefender）开发并维护，当前版本 v1.0.0。