← Back to Skills Marketplace
percivalee

MiniMax Multimodal (Speech + Image)

by percivalee · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
116
Downloads
1
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install minimax-speech-image
Description
MiniMax 多模态技能 — 接入 MiniMax Token Plan 接口,语音合成(TTS/音色克隆/音色设计) 和图片生成(文生图/图生图)。使用 speech-2.8-hd(语音)和 image-01(图像)模型, 消费 Token Plan 额度。当用户提到语音合成、音色克隆、图片生成、文生图、图生...
Usage Guidance
Before installing, consider the following: - Verify the provider domains (https://api.minimaxi.com and https://api.minimax.io) are the legitimate MiniMax endpoints you expect. If unsure, contact the provider or check an authoritative homepage — the skill lists no homepage. - Treat MINIMAX_API_KEY as sensitive: the client will send it with every request and the service will charge Token Plan credits for usage (TTS, cloning, image generation). Ensure you understand billing and rate limits. - Voice cloning uploads local audio to the remote /files endpoint. Do not upload private or sensitive audio without explicit consent — this transmits user data to the provider. - The skill bundles Python scripts that import the 'requests' library but does not declare dependencies or provide an install step. Install 'requests' in your environment or run in an isolated/sandboxed environment first. - The registry metadata omits the required env vars (MINIMAX_API_KEY, MINIMAX_REGION). This is a packaging inconsistency; ask the publisher to update metadata so automated policy/permission checks can surface required credentials before installation. - The image client will download URLs returned by the API. While expected, this means the skill may fetch remote content; consider network restrictions if you run in a sensitive environment. - If you plan to use this in production or with sensitive data, request additional provenance (publisher identity, homepage, or source repo) and run the code in a controlled test environment first. If you cannot verify the provider or correct the metadata/dependency omissions, treat the skill as higher risk and avoid providing production credentials or sensitive data.
Capability Assessment
Purpose & Capability
The scripts implement text-to-speech, voice cloning, voice design, image generation and image editing matching the SKILL.md description — network calls go only to the stated MiniMax API base URLs. However, the registry metadata lists no required environment variables while the SKILL.md and the code both require MINIMAX_API_KEY (and optionally MINIMAX_REGION). That metadata omission is inconsistent and should be corrected.
Instruction Scope
Runtime instructions and code stay within the stated purpose (calling remote APIs and saving returned media). Important operational behavior: voice cloning will upload local audio files to the remote /files endpoint, and image generation may download URLs returned by the API. These actions transmit user data to the provider and can consume Token Plan credits — the instructions do not ask the agent to read unrelated system files or other credentials.
Install Mechanism
This is an instruction-only skill with bundled Python scripts and no install spec. The scripts import the 'requests' library but the skill does not declare that dependency or provide an install step; that mismatch may lead to runtime failures or hidden additional setup. No external download URLs are used by the installer, which is lower risk, but the missing dependency declaration is an omission.
Credentials
The code legitimately requires MINIMAX_API_KEY (and optionally MINIMAX_REGION), and the SKILL.md documents these. However, the skill registry metadata lists no required env vars/primary credential. The requested environment access (an API key that can consume billing credits) is proportionate to the feature set, but the metadata mismatch is a packaging/visibility problem that could cause accidental exposures or misuse of credentials.
Persistence & Privilege
The skill does not request 'always: true' and does not modify other skills or system settings. It does not persist credentials itself; it reads MINIMAX_API_KEY from environment as expected. Autonomous invocation is allowed (platform default) but not combined with other high-risk indicators here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install minimax-speech-image
  3. After installation, invoke the skill by name or use /minimax-speech-image
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
Added MINIMAX_API_KEY environment variable documentation. Fixed voice cloning API flow. Improved image response parsing.
v1.0.0
- Initial release of MiniMax multimodal skill using Token Plan. - Supports speech synthesis (TTS), voice cloning, and voice design via speech-2.8-hd model. - Enables image generation (text-to-image and image-to-image) using image-01 model. - Includes command-line and Python API usage for all features. - Requires MiniMax API key and region settings. - Comprehensive documentation with voice and image module instructions.
Metadata
Slug minimax-speech-image
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is MiniMax Multimodal (Speech + Image)?

MiniMax 多模态技能 — 接入 MiniMax Token Plan 接口,语音合成(TTS/音色克隆/音色设计) 和图片生成(文生图/图生图)。使用 speech-2.8-hd(语音)和 image-01(图像)模型, 消费 Token Plan 额度。当用户提到语音合成、音色克隆、图片生成、文生图、图生... It is an AI Agent Skill for Claude Code / OpenClaw, with 116 downloads so far.

How do I install MiniMax Multimodal (Speech + Image)?

Run "/install minimax-speech-image" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is MiniMax Multimodal (Speech + Image) free?

Yes, MiniMax Multimodal (Speech + Image) is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does MiniMax Multimodal (Speech + Image) support?

MiniMax Multimodal (Speech + Image) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created MiniMax Multimodal (Speech + Image)?

It is built and maintained by percivalee (@percivalee); the current version is v1.0.1.

💬 Comments