← Back to Skills Marketplace

Minimax Image Understanding

Name: Minimax Image Understanding
Author: aidescend

by aidescend · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

844

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install minimax-image-understanding

Description

使用多模态大模型理解图片内容，生成业务含义描述。支持多种模型：(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等，生成精准的文字描述。

Usage Guidance

This skill appears to do what it says (send a local image to a selected multimodal model and return a description), but before installing or using it you should: - Confirm dependencies: ensure the runtime has the 'curl' binary (used by the MiniMax path) and the Python 'requests' package (used for OpenAI/Anthropic). The skill's metadata incorrectly states "no required binaries". - Consider privacy: the script base64-encodes and sends the entire image to remote APIs. Do not use it on images containing sensitive or private data unless you trust the target service and understand its retention policy. - Verify provider endpoints and keys: validate MINIMAX_API_HOST if you set it (default is https://api.minimaxi.com) and never hard-code API keys; supply them via environment variables as instructed. - Review model choice and costs: using OpenAI/Anthropic may incur usage charges and have different input formats/limits — test with non-sensitive images first. If you want stronger assurance, request an updated skill package that explicitly documents runtime dependencies (curl, requests) and includes checks that fail with clear messages when dependencies are missing.

Capability Analysis

Type: OpenClaw Skill Name: minimax-image-understanding Version: 1.0.0 The skill bundle provides a utility for image understanding using MiniMax, OpenAI, or Anthropic APIs. The script `scripts/understand_image.py` correctly handles API keys via environment variables and transmits image data to the respective service providers as described. No evidence of malicious intent, data exfiltration to unauthorized endpoints, or command injection vulnerabilities was found.

Capability Assessment

✓ Purpose & Capability

Name/description (image understanding via MiniMax/OpenAI/Anthropic) align with the included script and SKILL.md: the code reads a local image, base64-encodes it, and sends it to the selected model provider for analysis. Required environment variables listed in SKILL.md correspond to the providers used.

ℹ Instruction Scope

Runtime instructions and the script are scoped to reading a local image file and sending it to a model provider; they do not access unrelated system files or secrets. However the skill will transmit the entire image (base64) to remote APIs, so image confidentiality and provider trust are relevant security considerations that the user should evaluate.

⚠ Install Mechanism

No install spec is provided, but the script relies on external tools/libraries: it calls the 'curl' binary for the MiniMax path and imports the Python 'requests' module for OpenAI/Anthropic. The registry metadata claims 'required binaries: none' which contradicts the actual script requirements — this omission can cause runtime failures and indicates incomplete packaging/ documentation.

✓ Credentials

The env vars mentioned (MINIMAX_API_KEY, MINIMAX_API_HOST, OPENAI_API_KEY, ANTHROPIC_API_KEY) match the services the skill integrates with and are proportionate to its purpose. No unrelated credentials or additional config paths are requested.

✓ Persistence & Privilege

The skill does not request permanent presence (always:false) and does not modify other skills or system-wide settings. It runs on demand and does not persist credentials or change agent configuration.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install minimax-image-understanding
After installation, invoke the skill by name or use /minimax-image-understanding
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

minimax-image-understanding v1.0.0 - Initial release supporting multimodal image understanding using large models. - Compatible with MiniMax VLM (default, recommended for Chinese), OpenAI GPT-4V, and Claude Vision (Anthropic). - Simple CLI tool for generating business-centric descriptions of images, charts, and document photos. - Environment-variable-based configuration for easy model selection. - Output focuses on key content and business logic, omitting positional element listings.

Metadata

Slug minimax-image-understanding

Version 1.0.0

License —

All-time Installs 10

Active Installs 10

Total Versions 1

Frequently Asked Questions

What is Minimax Image Understanding?

使用多模态大模型理解图片内容，生成业务含义描述。支持多种模型：(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等，生成精准的文字描述。 It is an AI Agent Skill for Claude Code / OpenClaw, with 844 downloads so far.

How do I install Minimax Image Understanding?

Run "/install minimax-image-understanding" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Minimax Image Understanding free?

Yes, Minimax Image Understanding is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Minimax Image Understanding support?

Minimax Image Understanding is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Minimax Image Understanding?

It is built and maintained by aidescend (@aidescend); the current version is v1.0.0.

More Skills