Description

MiniMax multimodal generation via API. Use when user wants voice, music, image, image-to-image, or video generation with MiniMax. Supports TTS, music, image...

README (SKILL.md)

MiniMax Multimodal Toolkit

Name: Ali Minimax Toolkit
Author: yuan-huicheng

Generate voice, music, image, and video content via MiniMax APIs. Pure Python — works on Windows, Mac, and Linux without any third-party dependencies.

Prerequisites

MINIMAX_API_KEY environment variable (starts with sk-)
MINIMAX_API_HOST environment variable (optional, default: https://api.minimaxi.com)
Python 3.6+
For video duration detection: ffprobe (optional)

Quick Start

# Load the Python module
import sys; sys.path.insert(0, "{skillDir}/scripts"); import minimax_api

Or use CLI directly:

python "{skillDir}/scripts/minimax_api.py" tts "Hello world" -o minimax-output/hello.mp3
python "{skillDir}/scripts/minimax_api.py" image "A cute cat" -o minimax-output/cat.png

Output Convention

All generated files MUST be saved to minimax-output/ under the agent's working directory.

TTS (Text-to-Speech)

Endpoint: POST /v1/t2a_v2 — returns hex audio, decoded and saved as file.

Models: speech-2.8-hd (recommended, best quality), speech-2.8-turbo (faster), speech-02-hd, speech-02-turbo

# Basic TTS
minimax_api.generate_tts("Hello world", output="minimax-output/hello.mp3")

# Chinese with specific voice
minimax_api.generate_tts("红叶最多情，一舞寄相思", voice_id="female-shaonv", output="minimax-output/greeting.mp3")

# With emotion
minimax_api.generate_tts("I'm so happy today!", voice_id="male-qn-qingse", emotion="happy", output="minimax-output/happy.mp3")

Common voice IDs: female-shaonv, male-qn-qingse, male-qn-jingying, presenter_male, presenter_female Emotions: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper (empty = auto)

Music Generation

Endpoint: POST /v1/music_generation — lyrics required, returns audio URL. Takes 30-300 seconds.

# Instrumental (BGM)
minimax_api.generate_music("soft piano, ambient, peaceful", instrumental=True, output="minimax-output/bgm.mp3")

# Song with lyrics
minimax_api.generate_music(
    "indie folk, melancholic",
    lyrics="[verse]\
Walking alone\
[chorus]\
Feeling free",
    output="minimax-output/song.mp3"
)

Image Generation (Text-to-Image)

Endpoint: POST /v1/image_generation — returns image URLs (immediate).

# Basic
minimax_api.generate_image("A cute cat on a windowsill, photorealistic", output="minimax-output/cat.png")

# With aspect ratio
minimax_api.generate_image("Mountain landscape, golden hour", aspect_ratio="16:9", output="minimax-output/landscape.png")

# Multiple images
minimax_api.generate_image("Abstract geometric art, vibrant", count=3, output="minimax-output/art.png")

# With prompt optimizer
minimax_api.generate_image("A man on Venice Beach, 90s documentary", prompt_optimizer=True, output="minimax-output/beach.png")

Aspect ratios: 1:1 (default), 16:9, 4:3, 3:2, 2:3, 3:4, 9:16, 21:9

Image-to-Image Generation

Endpoint: POST /v1/image_generation with image_file — generate new images from a reference.

# From local file
minimax_api.image_to_image("A girl in a library", "minimax-output/face.jpg", output="minimax-output/library.png")

# From URL
minimax_api.image_to_image("Oil painting style", "https://example.com/photo.jpg", output="minimax-output/painting.png")

Video Generation

Endpoint: POST /v1/video_generation (async) + GET /v1/query/video_generation — polling required.

# Text-to-video
minimax_api.generate_video(
    "A golden retriever puppy runs toward camera, tracking shot, golden hour",
    output="minimax-output/puppy.mp4"
)

# Image-to-video (prompt focuses on MOTION only)
minimax_api.generate_video(
    "Petals sway in breeze, soft light shifts",
    mode="i2v", first_frame="minimax-output/flower.png",
    output="minimax-output/flower_video.mp4"
)

# Subject reference (face consistency)
minimax_api.generate_video(
    "A woman walks through a garden, tracking shot",
    mode="ref", subject_image="minimax-output/face.jpg",
    output="minimax-output/garden.mp4"
)

Models: MiniMax-Hailuo-2.3 (default), MiniMax-Hailuo-2.3-Fast (i2v), MiniMax-Hailuo-02 (1080P, 10s) Modes: t2v, i2v, sef (start-end frame), ref (subject reference)

Video Prompt Tips

Main subject + Scene + Movement + Camera motion + Aesthetic. For i2v: describe motion only, don't repeat what's in the image.

Generate & Send to Feishu

Use generate_and_send.py to generate content and prepare for Feishu delivery via the feishu-media skill:

# Generate TTS and send
python "{skillDir}/scripts/generate_and_send.py" tts "Hello" --voice female-shaonv --feishu-chat \x3Cchat_id>

# Generate image and send
python "{skillDir}/scripts/generate_and_send.py" image "A sunset" --ratio 16:9 --feishu-chat \x3Cchat_id>

# Set FEISHU_CHAT_ID env var to avoid passing --feishu-chat every time
export FEISHU_CHAT_ID=oc_xxxxx

After generation, the script outputs file paths and feishu-media send instructions. Use the feishu-media skill to actually deliver the content.

Legacy PowerShell Script

The original scripts/minimax-api.ps1 is preserved for backward compatibility but is deprecated. Use the Python scripts instead.

Error Handling

Error Code	Meaning	Solution
2061	Plan doesn't support model	Try `speech-02-turbo` for TTS
1008	Insufficient balance	Top up MiniMax account
2013	Invalid params	Check required fields

References

See references/ folder for detailed API docs, voice catalogs, and prompt guides.

Usage Guidance

What to check before installing: - Do not provide your MINIMAX_API_KEY until you verify the publisher and origin. The code requires MINIMAX_API_KEY (it will fail without it), but the registry metadata omits this — that mismatch is suspicious. - Confirm the skill's owner/provenance: _meta.json in the bundle shows a different ownerId/slug than the registry listing. Ask the publisher for a canonical source (GitHub repo or homepage) and for corrected metadata. - Note missing files referenced in docs: SKILL.md/refs mention requirements.txt and a PowerShell script that are not present. Ask the author to include or remove these references. - The scripts call only api.minimaxi.com (no obfuscated endpoints), and use only the Python stdlib; still, treat the API key as a sensitive secret — if you must test, use a scoped/test key and run in an isolated environment. Rotate the key afterwards. - If you plan to use Feishu delivery, verify the separate feishu-media skill and avoid exposing other credentials. - If uncertain, request the publisher to fix metadata (declare MINIMAX_API_KEY as required), provide a trusted homepage/repo, and explain the missing files before trusting the skill with live credentials.

Capability Analysis

Type: OpenClaw Skill Name: ali-minimax-toolkit Version: 1.0.0 The ali-minimax-toolkit is a well-structured set of Python scripts designed to interface with MiniMax multimodal APIs for generating TTS, music, images, and video. The core logic in `minimax_api.py` uses the Python standard library for HTTP requests and follows safe practices, such as using list-based arguments for `subprocess` calls to `ffprobe` to prevent shell injection. The `generate_and_send.py` script facilitates a legitimate workflow for preparing generated content for delivery via Feishu. The documentation in `SKILL.md` and the `references/` directory is comprehensive and lacks any signs of prompt injection or malicious instructions. No evidence of data exfiltration, obfuscation, or unauthorized access was found.

Capability Assessment

ℹ Purpose & Capability

Skill claims to provide MiniMax multimodal generation (TTS, music, image, video) and the code indeed implements API calls to api.minimaxi.com. Requiring a MINIMAX_API_KEY is coherent with the stated purpose. However the registry metadata declared no required environment variables or primary credential while SKILL.md and the Python code require MINIMAX_API_KEY (and optionally MINIMAX_API_HOST and MINIMAX_OUTPUT_DIR). This metadata omission is an incoherence.

⚠ Instruction Scope

SKILL.md and scripts instruct the agent to read MINIMAX_API_KEY, optionally MINIMAX_API_HOST, and FEISHU_CHAT_ID and to write generated files to minimax-output/. The instructions also reference a PowerShell script and a requirements.txt/ffmpeg installation for some workflows, but those files are not present in the manifest (scripts/minimax-api.ps1 is referenced as 'preserved' but not included; no requirements.txt present). The generate_and_send script prints Feishu send instructions but does not itself perform Feishu network calls (it expects a separate feishu-media skill). The agent will therefore access environment variables and network endpoints not declared in the registry metadata — this is scope creep and a transparency issue.

ℹ Install Mechanism

There is no install spec (instruction-only install), which is lower risk from an installer perspective. The skill includes Python scripts that run with only the standard library; network calls are made via urllib. However SKILL.md and references mention pip install -r requirements.txt and ffmpeg, despite no requirements.txt in the bundle — this inconsistency could confuse users and lead to accidental installs from other sources.

⚠ Credentials

The code requires MINIMAX_API_KEY (expected for calling the MiniMax API). But the registry declared no required env vars/credentials; SKILL.md explicitly expects MINIMAX_API_KEY (format 'sk-...') and optionally FEISHU_CHAT_ID and MINIMAX_API_HOST. The undeclared required secret (API key) is a material omission. FEISHU_CHAT_ID is not a secret but the skill's documentation ties into a separate feishu-media skill — supplying a chat ID may cause the agent to prepare data for external delivery. Also _meta.json contains a different ownerId/slug than the registry metadata, which raises provenance concerns.

✓ Persistence & Privilege

The skill does not request 'always: true' and does not modify other skills or system-wide configuration. It writes output files into a local minimax-output/ directory (normal for generated media). The skill can be invoked autonomously (disable-model-invocation is false), which is the platform default; that is not itself flagged, but combined with the undeclared API key requirement increases risk.

Version History

v1.0.0

- Initial release of ali-minimax-toolkit for MiniMax multimodal generation. - Supports TTS, music, image (text-to-image & image-to-image), and video (text/image-to-video, subject/sequence reference) generation via MiniMax APIs. - Pure Python implementation: cross-platform, no third-party dependencies required. - Provides both Python module and CLI usage; all outputs saved to minimax-output/ directory. - Includes quick-start guides, error handling info, API references, and Feishu integration instructions.

Metadata

Slug ali-minimax-toolkit

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Ali Minimax Toolkit?

MiniMax multimodal generation via API. Use when user wants voice, music, image, image-to-image, or video generation with MiniMax. Supports TTS, music, image... It is an AI Agent Skill for Claude Code / OpenClaw, with 103 downloads so far.

How do I install Ali Minimax Toolkit?

Run "/install ali-minimax-toolkit" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ali Minimax Toolkit free?

Yes, Ali Minimax Toolkit is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Ali Minimax Toolkit support?

Ali Minimax Toolkit is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ali Minimax Toolkit?

It is built and maintained by yuan-huicheng (@yuan-huicheng); the current version is v1.0.0.

More Skills

Ali Minimax Toolkit