← Back to Skills Marketplace
GLM Multimodal Analyzer
by
TriDefender
· GitHub ↗
· v1.0.0
576
Downloads
0
Stars
3
Active Installs
1
Versions
Install in OpenClaw
/install multimodal
Description
使用GLM-4.6V模型进行多模态内容理解(图片、视频、文档)
Usage Guidance
This skill will read local files (images, videos, PDFs), base64-encode them, and send their contents to https://open.bigmodel.cn using a ZHIPU_API_KEY. Before installing: (1) Confirm the skill metadata is corrected to declare ZHIPU_API_KEY; (2) Verify you trust the remote endpoint and the publisher — the Homepage and source are unknown; (3) Do not feed sensitive or private files (passwords, keys, proprietary docs) to the skill; (4) Consider using an ephemeral or scoped API key and audit API usage; (5) If you need higher assurance, request the publisher provide provenance (source repo, signatures) or review the code yourself — the relevant behavior is visible in scripts/analyze.py. If you accept these privacy risks and trust the endpoint, the functionality is coherent; if not, do not install or run with sensitive inputs.
Capability Analysis
Type: OpenClaw Skill
Name: multimodal
Version: 1.0.0
The skill bundle contains a command injection vulnerability in agent.json within the toolHandlers section, where user-provided parameters (input and prompt) are wrapped in single quotes and passed directly to a shell command. This allows an attacker to escape the quotes and execute arbitrary commands on the host system. While the Python script scripts/analyze.py appears to be a legitimate tool for interacting with the Zhipu AI API (open.bigmodel.cn), the insecure handling of shell execution makes the bundle high-risk.
Capability Assessment
Purpose & Capability
The skill's purpose (multimodal analysis via GLM-4.6V) matches the code and agent configuration. However the registry metadata lists no required env vars while SKILL.md and scripts/analyze.py require ZHIPU_API_KEY — an inconsistency in declared requirements. Minor model naming/context inconsistencies (SKILL.md: GLM-4.6V 128K, agent.json/model: 'zai/glm-4.6v-flash', script MODEL='glm-4.6v', MAX_TOKENS=4096) are also present.
Instruction Scope
SKILL.md and analyze.py allow local file paths and will base64-encode entire local files and include them in requests to https://open.bigmodel.cn/api/paas/v4/chat/completions. That behavior is coherent with a multimodal uploader, but it means arbitrary local files (including sensitive documents) may be exfiltrated to the remote API without additional safeguards or filtering.
Install Mechanism
This is an instruction-only skill with no install spec (lowest install risk). README mentions requests will be auto-installed but there is no formal install step; the script exits if requests is missing. No external downloads or packaged installers are used.
Credentials
The runtime requires a single secret ZHIPU_API_KEY (used as a Bearer token) which is proportionate to calling a third-party API. The problem is that the registry metadata did not declare this requirement — the skill should have listed ZHIPU_API_KEY as required.env. Requiring an API key for the claimed purpose is expected, but the omission in metadata and the ability to send arbitrary local files increases risk.
Persistence & Privilege
The skill does not request always:true, does not declare system config paths, and does not modify other skills. It is user-invocable and can be invoked autonomously per platform default (not flagged here).
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install multimodal - After installation, invoke the skill by name or use
/multimodal - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Multimodal Analyzer 1.0.0
- Initial release with GLM-4.6V-powered multimodal content understanding.
- Supports image OCR, scene and object analysis, video summarization & keyframe extraction, and document (PDF/table) parsing.
- Includes a deep thinking mode for advanced reasoning.
- Command-line interface for content analysis via script.
- Currently processes one modality at a time; requires publicly accessible URLs for videos.
Metadata
Frequently Asked Questions
What is GLM Multimodal Analyzer?
使用GLM-4.6V模型进行多模态内容理解(图片、视频、文档). It is an AI Agent Skill for Claude Code / OpenClaw, with 576 downloads so far.
How do I install GLM Multimodal Analyzer?
Run "/install multimodal" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is GLM Multimodal Analyzer free?
Yes, GLM Multimodal Analyzer is completely free (open-source). You can download, install and use it at no cost.
Which platforms does GLM Multimodal Analyzer support?
GLM Multimodal Analyzer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created GLM Multimodal Analyzer?
It is built and maintained by TriDefender (@tridefender); the current version is v1.0.0.
More Skills