← Back to Skills Marketplace

MiniMax Multimodal (Speech + Image)

Name: MiniMax Multimodal (Speech + Image)
Author: percivalee

by percivalee · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

116

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install minimax-speech-image

Description

MiniMax 多模态技能 — 接入 MiniMax Token Plan 接口，语音合成（TTS/音色克隆/音色设计）和图片生成（文生图/图生图）。使用 speech-2.8-hd（语音）和 image-01（图像）模型，消费 Token Plan 额度。当用户提到语音合成、音色克隆、图片生成、文生图、图生...

Usage Guidance

Before installing, consider the following: - Verify the provider domains (https://api.minimaxi.com and https://api.minimax.io) are the legitimate MiniMax endpoints you expect. If unsure, contact the provider or check an authoritative homepage — the skill lists no homepage. - Treat MINIMAX_API_KEY as sensitive: the client will send it with every request and the service will charge Token Plan credits for usage (TTS, cloning, image generation). Ensure you understand billing and rate limits. - Voice cloning uploads local audio to the remote /files endpoint. Do not upload private or sensitive audio without explicit consent — this transmits user data to the provider. - The skill bundles Python scripts that import the 'requests' library but does not declare dependencies or provide an install step. Install 'requests' in your environment or run in an isolated/sandboxed environment first. - The registry metadata omits the required env vars (MINIMAX_API_KEY, MINIMAX_REGION). This is a packaging inconsistency; ask the publisher to update metadata so automated policy/permission checks can surface required credentials before installation. - The image client will download URLs returned by the API. While expected, this means the skill may fetch remote content; consider network restrictions if you run in a sensitive environment. - If you plan to use this in production or with sensitive data, request additional provenance (publisher identity, homepage, or source repo) and run the code in a controlled test environment first. If you cannot verify the provider or correct the metadata/dependency omissions, treat the skill as higher risk and avoid providing production credentials or sensitive data.

Capability Assessment

ℹ Purpose & Capability

The scripts implement text-to-speech, voice cloning, voice design, image generation and image editing matching the SKILL.md description — network calls go only to the stated MiniMax API base URLs. However, the registry metadata lists no required environment variables while the SKILL.md and the code both require MINIMAX_API_KEY (and optionally MINIMAX_REGION). That metadata omission is inconsistent and should be corrected.

ℹ Instruction Scope

Runtime instructions and code stay within the stated purpose (calling remote APIs and saving returned media). Important operational behavior: voice cloning will upload local audio files to the remote /files endpoint, and image generation may download URLs returned by the API. These actions transmit user data to the provider and can consume Token Plan credits — the instructions do not ask the agent to read unrelated system files or other credentials.

⚠ Install Mechanism

This is an instruction-only skill with bundled Python scripts and no install spec. The scripts import the 'requests' library but the skill does not declare that dependency or provide an install step; that mismatch may lead to runtime failures or hidden additional setup. No external download URLs are used by the installer, which is lower risk, but the missing dependency declaration is an omission.

⚠ Credentials

The code legitimately requires MINIMAX_API_KEY (and optionally MINIMAX_REGION), and the SKILL.md documents these. However, the skill registry metadata lists no required env vars/primary credential. The requested environment access (an API key that can consume billing credits) is proportionate to the feature set, but the metadata mismatch is a packaging/visibility problem that could cause accidental exposures or misuse of credentials.

✓ Persistence & Privilege

The skill does not request 'always: true' and does not modify other skills or system settings. It does not persist credentials itself; it reads MINIMAX_API_KEY from environment as expected. Autonomous invocation is allowed (platform default) but not combined with other high-risk indicators here.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install minimax-speech-image
After installation, invoke the skill by name or use /minimax-speech-image
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

Added MINIMAX_API_KEY environment variable documentation. Fixed voice cloning API flow. Improved image response parsing.

v1.0.0

- Initial release of MiniMax multimodal skill using Token Plan. - Supports speech synthesis (TTS), voice cloning, and voice design via speech-2.8-hd model. - Enables image generation (text-to-image and image-to-image) using image-01 model. - Includes command-line and Python API usage for all features. - Requires MiniMax API key and region settings. - Comprehensive documentation with voice and image module instructions.

Metadata

Slug minimax-speech-image

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is MiniMax Multimodal (Speech + Image)?

MiniMax 多模态技能 — 接入 MiniMax Token Plan 接口，语音合成（TTS/音色克隆/音色设计）和图片生成（文生图/图生图）。使用 speech-2.8-hd（语音）和 image-01（图像）模型，消费 Token Plan 额度。当用户提到语音合成、音色克隆、图片生成、文生图、图生... It is an AI Agent Skill for Claude Code / OpenClaw, with 116 downloads so far.

How do I install MiniMax Multimodal (Speech + Image)?

Run "/install minimax-speech-image" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is MiniMax Multimodal (Speech + Image) free?

Yes, MiniMax Multimodal (Speech + Image) is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does MiniMax Multimodal (Speech + Image) support?

MiniMax Multimodal (Speech + Image) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created MiniMax Multimodal (Speech + Image)?

It is built and maintained by percivalee (@percivalee); the current version is v1.0.1.

More Skills