← Back to Skills Marketplace
tridefender

GLM Multimodal Analyzer

by TriDefender · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
576
Downloads
0
Stars
3
Active Installs
1
Versions
Install in OpenClaw
/install multimodal
Description
使用GLM-4.6V模型进行多模态内容理解(图片、视频、文档)
Usage Guidance
This skill will read local files (images, videos, PDFs), base64-encode them, and send their contents to https://open.bigmodel.cn using a ZHIPU_API_KEY. Before installing: (1) Confirm the skill metadata is corrected to declare ZHIPU_API_KEY; (2) Verify you trust the remote endpoint and the publisher — the Homepage and source are unknown; (3) Do not feed sensitive or private files (passwords, keys, proprietary docs) to the skill; (4) Consider using an ephemeral or scoped API key and audit API usage; (5) If you need higher assurance, request the publisher provide provenance (source repo, signatures) or review the code yourself — the relevant behavior is visible in scripts/analyze.py. If you accept these privacy risks and trust the endpoint, the functionality is coherent; if not, do not install or run with sensitive inputs.
Capability Analysis
Type: OpenClaw Skill Name: multimodal Version: 1.0.0 The skill bundle contains a command injection vulnerability in agent.json within the toolHandlers section, where user-provided parameters (input and prompt) are wrapped in single quotes and passed directly to a shell command. This allows an attacker to escape the quotes and execute arbitrary commands on the host system. While the Python script scripts/analyze.py appears to be a legitimate tool for interacting with the Zhipu AI API (open.bigmodel.cn), the insecure handling of shell execution makes the bundle high-risk.
Capability Assessment
Purpose & Capability
The skill's purpose (multimodal analysis via GLM-4.6V) matches the code and agent configuration. However the registry metadata lists no required env vars while SKILL.md and scripts/analyze.py require ZHIPU_API_KEY — an inconsistency in declared requirements. Minor model naming/context inconsistencies (SKILL.md: GLM-4.6V 128K, agent.json/model: 'zai/glm-4.6v-flash', script MODEL='glm-4.6v', MAX_TOKENS=4096) are also present.
Instruction Scope
SKILL.md and analyze.py allow local file paths and will base64-encode entire local files and include them in requests to https://open.bigmodel.cn/api/paas/v4/chat/completions. That behavior is coherent with a multimodal uploader, but it means arbitrary local files (including sensitive documents) may be exfiltrated to the remote API without additional safeguards or filtering.
Install Mechanism
This is an instruction-only skill with no install spec (lowest install risk). README mentions requests will be auto-installed but there is no formal install step; the script exits if requests is missing. No external downloads or packaged installers are used.
Credentials
The runtime requires a single secret ZHIPU_API_KEY (used as a Bearer token) which is proportionate to calling a third-party API. The problem is that the registry metadata did not declare this requirement — the skill should have listed ZHIPU_API_KEY as required.env. Requiring an API key for the claimed purpose is expected, but the omission in metadata and the ability to send arbitrary local files increases risk.
Persistence & Privilege
The skill does not request always:true, does not declare system config paths, and does not modify other skills. It is user-invocable and can be invoked autonomously per platform default (not flagged here).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install multimodal
  3. After installation, invoke the skill by name or use /multimodal
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Multimodal Analyzer 1.0.0 - Initial release with GLM-4.6V-powered multimodal content understanding. - Supports image OCR, scene and object analysis, video summarization & keyframe extraction, and document (PDF/table) parsing. - Includes a deep thinking mode for advanced reasoning. - Command-line interface for content analysis via script. - Currently processes one modality at a time; requires publicly accessible URLs for videos.
Metadata
Slug multimodal
Version 1.0.0
License
All-time Installs 3
Active Installs 3
Total Versions 1
Frequently Asked Questions

What is GLM Multimodal Analyzer?

使用GLM-4.6V模型进行多模态内容理解(图片、视频、文档). It is an AI Agent Skill for Claude Code / OpenClaw, with 576 downloads so far.

How do I install GLM Multimodal Analyzer?

Run "/install multimodal" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GLM Multimodal Analyzer free?

Yes, GLM Multimodal Analyzer is completely free (open-source). You can download, install and use it at no cost.

Which platforms does GLM Multimodal Analyzer support?

GLM Multimodal Analyzer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GLM Multimodal Analyzer?

It is built and maintained by TriDefender (@tridefender); the current version is v1.0.0.

💬 Comments