← Back to Skills Marketplace

GLM Multimodal Analyzer

Name: GLM Multimodal Analyzer
Author: tridefender

by TriDefender · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

576

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install multimodal

Description

使用GLM-4.6V模型进行多模态内容理解（图片、视频、文档）

Usage Guidance

This skill will read local files (images, videos, PDFs), base64-encode them, and send their contents to https://open.bigmodel.cn using a ZHIPU_API_KEY. Before installing: (1) Confirm the skill metadata is corrected to declare ZHIPU_API_KEY; (2) Verify you trust the remote endpoint and the publisher — the Homepage and source are unknown; (3) Do not feed sensitive or private files (passwords, keys, proprietary docs) to the skill; (4) Consider using an ephemeral or scoped API key and audit API usage; (5) If you need higher assurance, request the publisher provide provenance (source repo, signatures) or review the code yourself — the relevant behavior is visible in scripts/analyze.py. If you accept these privacy risks and trust the endpoint, the functionality is coherent; if not, do not install or run with sensitive inputs.

Capability Analysis

Type: OpenClaw Skill Name: multimodal Version: 1.0.0 The skill bundle contains a command injection vulnerability in agent.json within the toolHandlers section, where user-provided parameters (input and prompt) are wrapped in single quotes and passed directly to a shell command. This allows an attacker to escape the quotes and execute arbitrary commands on the host system. While the Python script scripts/analyze.py appears to be a legitimate tool for interacting with the Zhipu AI API (open.bigmodel.cn), the insecure handling of shell execution makes the bundle high-risk.

Capability Assessment

⚠ Purpose & Capability

The skill's purpose (multimodal analysis via GLM-4.6V) matches the code and agent configuration. However the registry metadata lists no required env vars while SKILL.md and scripts/analyze.py require ZHIPU_API_KEY — an inconsistency in declared requirements. Minor model naming/context inconsistencies (SKILL.md: GLM-4.6V 128K, agent.json/model: 'zai/glm-4.6v-flash', script MODEL='glm-4.6v', MAX_TOKENS=4096) are also present.

⚠ Instruction Scope

SKILL.md and analyze.py allow local file paths and will base64-encode entire local files and include them in requests to https://open.bigmodel.cn/api/paas/v4/chat/completions. That behavior is coherent with a multimodal uploader, but it means arbitrary local files (including sensitive documents) may be exfiltrated to the remote API without additional safeguards or filtering.

ℹ Install Mechanism

This is an instruction-only skill with no install spec (lowest install risk). README mentions requests will be auto-installed but there is no formal install step; the script exits if requests is missing. No external downloads or packaged installers are used.

⚠ Credentials

The runtime requires a single secret ZHIPU_API_KEY (used as a Bearer token) which is proportionate to calling a third-party API. The problem is that the registry metadata did not declare this requirement — the skill should have listed ZHIPU_API_KEY as required.env. Requiring an API key for the claimed purpose is expected, but the omission in metadata and the ability to send arbitrary local files increases risk.

✓ Persistence & Privilege

The skill does not request always:true, does not declare system config paths, and does not modify other skills. It is user-invocable and can be invoked autonomously per platform default (not flagged here).

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install multimodal
After installation, invoke the skill by name or use /multimodal
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Multimodal Analyzer 1.0.0 - Initial release with GLM-4.6V-powered multimodal content understanding. - Supports image OCR, scene and object analysis, video summarization & keyframe extraction, and document (PDF/table) parsing. - Includes a deep thinking mode for advanced reasoning. - Command-line interface for content analysis via script. - Currently processes one modality at a time; requires publicly accessible URLs for videos.

Metadata

Slug multimodal

Version 1.0.0

License —

All-time Installs 3

Active Installs 3

Total Versions 1

Frequently Asked Questions

What is GLM Multimodal Analyzer?

使用GLM-4.6V模型进行多模态内容理解（图片、视频、文档）. It is an AI Agent Skill for Claude Code / OpenClaw, with 576 downloads so far.

How do I install GLM Multimodal Analyzer?

Run "/install multimodal" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GLM Multimodal Analyzer free?

Yes, GLM Multimodal Analyzer is completely free (open-source). You can download, install and use it at no cost.

Which platforms does GLM Multimodal Analyzer support?

GLM Multimodal Analyzer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GLM Multimodal Analyzer?

It is built and maintained by TriDefender (@tridefender); the current version is v1.0.0.

More Skills