← Back to Skills Marketplace
yuan-huicheng

Ali Minimax Toolkit

by yuan-huicheng · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
103
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install ali-minimax-toolkit
Description
MiniMax multimodal generation via API. Use when user wants voice, music, image, image-to-image, or video generation with MiniMax. Supports TTS, music, image...
README (SKILL.md)

MiniMax Multimodal Toolkit

Generate voice, music, image, and video content via MiniMax APIs. Pure Python — works on Windows, Mac, and Linux without any third-party dependencies.

Prerequisites

  • MINIMAX_API_KEY environment variable (starts with sk-)
  • MINIMAX_API_HOST environment variable (optional, default: https://api.minimaxi.com)
  • Python 3.6+
  • For video duration detection: ffprobe (optional)

Quick Start

# Load the Python module
import sys; sys.path.insert(0, "{skillDir}/scripts"); import minimax_api

Or use CLI directly:

python "{skillDir}/scripts/minimax_api.py" tts "Hello world" -o minimax-output/hello.mp3
python "{skillDir}/scripts/minimax_api.py" image "A cute cat" -o minimax-output/cat.png

Output Convention

All generated files MUST be saved to minimax-output/ under the agent's working directory.

TTS (Text-to-Speech)

Endpoint: POST /v1/t2a_v2 — returns hex audio, decoded and saved as file.

Models: speech-2.8-hd (recommended, best quality), speech-2.8-turbo (faster), speech-02-hd, speech-02-turbo

# Basic TTS
minimax_api.generate_tts("Hello world", output="minimax-output/hello.mp3")

# Chinese with specific voice
minimax_api.generate_tts("红叶最多情,一舞寄相思", voice_id="female-shaonv", output="minimax-output/greeting.mp3")

# With emotion
minimax_api.generate_tts("I'm so happy today!", voice_id="male-qn-qingse", emotion="happy", output="minimax-output/happy.mp3")

Common voice IDs: female-shaonv, male-qn-qingse, male-qn-jingying, presenter_male, presenter_female Emotions: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper (empty = auto)

Music Generation

Endpoint: POST /v1/music_generation — lyrics required, returns audio URL. Takes 30-300 seconds.

# Instrumental (BGM)
minimax_api.generate_music("soft piano, ambient, peaceful", instrumental=True, output="minimax-output/bgm.mp3")

# Song with lyrics
minimax_api.generate_music(
    "indie folk, melancholic",
    lyrics="[verse]\
Walking alone\
[chorus]\
Feeling free",
    output="minimax-output/song.mp3"
)

Image Generation (Text-to-Image)

Endpoint: POST /v1/image_generation — returns image URLs (immediate).

# Basic
minimax_api.generate_image("A cute cat on a windowsill, photorealistic", output="minimax-output/cat.png")

# With aspect ratio
minimax_api.generate_image("Mountain landscape, golden hour", aspect_ratio="16:9", output="minimax-output/landscape.png")

# Multiple images
minimax_api.generate_image("Abstract geometric art, vibrant", count=3, output="minimax-output/art.png")

# With prompt optimizer
minimax_api.generate_image("A man on Venice Beach, 90s documentary", prompt_optimizer=True, output="minimax-output/beach.png")

Aspect ratios: 1:1 (default), 16:9, 4:3, 3:2, 2:3, 3:4, 9:16, 21:9

Image-to-Image Generation

Endpoint: POST /v1/image_generation with image_file — generate new images from a reference.

# From local file
minimax_api.image_to_image("A girl in a library", "minimax-output/face.jpg", output="minimax-output/library.png")

# From URL
minimax_api.image_to_image("Oil painting style", "https://example.com/photo.jpg", output="minimax-output/painting.png")

Video Generation

Endpoint: POST /v1/video_generation (async) + GET /v1/query/video_generation — polling required.

# Text-to-video
minimax_api.generate_video(
    "A golden retriever puppy runs toward camera, tracking shot, golden hour",
    output="minimax-output/puppy.mp4"
)

# Image-to-video (prompt focuses on MOTION only)
minimax_api.generate_video(
    "Petals sway in breeze, soft light shifts",
    mode="i2v", first_frame="minimax-output/flower.png",
    output="minimax-output/flower_video.mp4"
)

# Subject reference (face consistency)
minimax_api.generate_video(
    "A woman walks through a garden, tracking shot",
    mode="ref", subject_image="minimax-output/face.jpg",
    output="minimax-output/garden.mp4"
)

Models: MiniMax-Hailuo-2.3 (default), MiniMax-Hailuo-2.3-Fast (i2v), MiniMax-Hailuo-02 (1080P, 10s) Modes: t2v, i2v, sef (start-end frame), ref (subject reference)

Video Prompt Tips

Main subject + Scene + Movement + Camera motion + Aesthetic. For i2v: describe motion only, don't repeat what's in the image.

Generate & Send to Feishu

Use generate_and_send.py to generate content and prepare for Feishu delivery via the feishu-media skill:

# Generate TTS and send
python "{skillDir}/scripts/generate_and_send.py" tts "Hello" --voice female-shaonv --feishu-chat \x3Cchat_id>

# Generate image and send
python "{skillDir}/scripts/generate_and_send.py" image "A sunset" --ratio 16:9 --feishu-chat \x3Cchat_id>

# Set FEISHU_CHAT_ID env var to avoid passing --feishu-chat every time
export FEISHU_CHAT_ID=oc_xxxxx

After generation, the script outputs file paths and feishu-media send instructions. Use the feishu-media skill to actually deliver the content.

Legacy PowerShell Script

The original scripts/minimax-api.ps1 is preserved for backward compatibility but is deprecated. Use the Python scripts instead.

Error Handling

Error Code Meaning Solution
2061 Plan doesn't support model Try speech-02-turbo for TTS
1008 Insufficient balance Top up MiniMax account
2013 Invalid params Check required fields

References

See references/ folder for detailed API docs, voice catalogs, and prompt guides.

Usage Guidance
What to check before installing: - Do not provide your MINIMAX_API_KEY until you verify the publisher and origin. The code requires MINIMAX_API_KEY (it will fail without it), but the registry metadata omits this — that mismatch is suspicious. - Confirm the skill's owner/provenance: _meta.json in the bundle shows a different ownerId/slug than the registry listing. Ask the publisher for a canonical source (GitHub repo or homepage) and for corrected metadata. - Note missing files referenced in docs: SKILL.md/refs mention requirements.txt and a PowerShell script that are not present. Ask the author to include or remove these references. - The scripts call only api.minimaxi.com (no obfuscated endpoints), and use only the Python stdlib; still, treat the API key as a sensitive secret — if you must test, use a scoped/test key and run in an isolated environment. Rotate the key afterwards. - If you plan to use Feishu delivery, verify the separate feishu-media skill and avoid exposing other credentials. - If uncertain, request the publisher to fix metadata (declare MINIMAX_API_KEY as required), provide a trusted homepage/repo, and explain the missing files before trusting the skill with live credentials.
Capability Analysis
Type: OpenClaw Skill Name: ali-minimax-toolkit Version: 1.0.0 The ali-minimax-toolkit is a well-structured set of Python scripts designed to interface with MiniMax multimodal APIs for generating TTS, music, images, and video. The core logic in `minimax_api.py` uses the Python standard library for HTTP requests and follows safe practices, such as using list-based arguments for `subprocess` calls to `ffprobe` to prevent shell injection. The `generate_and_send.py` script facilitates a legitimate workflow for preparing generated content for delivery via Feishu. The documentation in `SKILL.md` and the `references/` directory is comprehensive and lacks any signs of prompt injection or malicious instructions. No evidence of data exfiltration, obfuscation, or unauthorized access was found.
Capability Assessment
Purpose & Capability
Skill claims to provide MiniMax multimodal generation (TTS, music, image, video) and the code indeed implements API calls to api.minimaxi.com. Requiring a MINIMAX_API_KEY is coherent with the stated purpose. However the registry metadata declared no required environment variables or primary credential while SKILL.md and the Python code require MINIMAX_API_KEY (and optionally MINIMAX_API_HOST and MINIMAX_OUTPUT_DIR). This metadata omission is an incoherence.
Instruction Scope
SKILL.md and scripts instruct the agent to read MINIMAX_API_KEY, optionally MINIMAX_API_HOST, and FEISHU_CHAT_ID and to write generated files to minimax-output/. The instructions also reference a PowerShell script and a requirements.txt/ffmpeg installation for some workflows, but those files are not present in the manifest (scripts/minimax-api.ps1 is referenced as 'preserved' but not included; no requirements.txt present). The generate_and_send script prints Feishu send instructions but does not itself perform Feishu network calls (it expects a separate feishu-media skill). The agent will therefore access environment variables and network endpoints not declared in the registry metadata — this is scope creep and a transparency issue.
Install Mechanism
There is no install spec (instruction-only install), which is lower risk from an installer perspective. The skill includes Python scripts that run with only the standard library; network calls are made via urllib. However SKILL.md and references mention pip install -r requirements.txt and ffmpeg, despite no requirements.txt in the bundle — this inconsistency could confuse users and lead to accidental installs from other sources.
Credentials
The code requires MINIMAX_API_KEY (expected for calling the MiniMax API). But the registry declared no required env vars/credentials; SKILL.md explicitly expects MINIMAX_API_KEY (format 'sk-...') and optionally FEISHU_CHAT_ID and MINIMAX_API_HOST. The undeclared required secret (API key) is a material omission. FEISHU_CHAT_ID is not a secret but the skill's documentation ties into a separate feishu-media skill — supplying a chat ID may cause the agent to prepare data for external delivery. Also _meta.json contains a different ownerId/slug than the registry metadata, which raises provenance concerns.
Persistence & Privilege
The skill does not request 'always: true' and does not modify other skills or system-wide configuration. It writes output files into a local minimax-output/ directory (normal for generated media). The skill can be invoked autonomously (disable-model-invocation is false), which is the platform default; that is not itself flagged, but combined with the undeclared API key requirement increases risk.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ali-minimax-toolkit
  3. After installation, invoke the skill by name or use /ali-minimax-toolkit
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of ali-minimax-toolkit for MiniMax multimodal generation. - Supports TTS, music, image (text-to-image & image-to-image), and video (text/image-to-video, subject/sequence reference) generation via MiniMax APIs. - Pure Python implementation: cross-platform, no third-party dependencies required. - Provides both Python module and CLI usage; all outputs saved to minimax-output/ directory. - Includes quick-start guides, error handling info, API references, and Feishu integration instructions.
Metadata
Slug ali-minimax-toolkit
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Ali Minimax Toolkit?

MiniMax multimodal generation via API. Use when user wants voice, music, image, image-to-image, or video generation with MiniMax. Supports TTS, music, image... It is an AI Agent Skill for Claude Code / OpenClaw, with 103 downloads so far.

How do I install Ali Minimax Toolkit?

Run "/install ali-minimax-toolkit" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ali Minimax Toolkit free?

Yes, Ali Minimax Toolkit is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Ali Minimax Toolkit support?

Ali Minimax Toolkit is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ali Minimax Toolkit?

It is built and maintained by yuan-huicheng (@yuan-huicheng); the current version is v1.0.0.

💬 Comments