← 返回 Skills 市场

MiniMax Multimodal (Speech + Image)

Name: MiniMax Multimodal (Speech + Image)
Author: percivalee

作者 percivalee · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

116

总下载

当前安装

版本数

在 OpenClaw 中安装

/install minimax-speech-image

功能描述

MiniMax 多模态技能 — 接入 MiniMax Token Plan 接口，语音合成（TTS/音色克隆/音色设计）和图片生成（文生图/图生图）。使用 speech-2.8-hd（语音）和 image-01（图像）模型，消费 Token Plan 额度。当用户提到语音合成、音色克隆、图片生成、文生图、图生...

安全使用建议

Before installing, consider the following: - Verify the provider domains (https://api.minimaxi.com and https://api.minimax.io) are the legitimate MiniMax endpoints you expect. If unsure, contact the provider or check an authoritative homepage — the skill lists no homepage. - Treat MINIMAX_API_KEY as sensitive: the client will send it with every request and the service will charge Token Plan credits for usage (TTS, cloning, image generation). Ensure you understand billing and rate limits. - Voice cloning uploads local audio to the remote /files endpoint. Do not upload private or sensitive audio without explicit consent — this transmits user data to the provider. - The skill bundles Python scripts that import the 'requests' library but does not declare dependencies or provide an install step. Install 'requests' in your environment or run in an isolated/sandboxed environment first. - The registry metadata omits the required env vars (MINIMAX_API_KEY, MINIMAX_REGION). This is a packaging inconsistency; ask the publisher to update metadata so automated policy/permission checks can surface required credentials before installation. - The image client will download URLs returned by the API. While expected, this means the skill may fetch remote content; consider network restrictions if you run in a sensitive environment. - If you plan to use this in production or with sensitive data, request additional provenance (publisher identity, homepage, or source repo) and run the code in a controlled test environment first. If you cannot verify the provider or correct the metadata/dependency omissions, treat the skill as higher risk and avoid providing production credentials or sensitive data.

能力评估

ℹ Purpose & Capability

The scripts implement text-to-speech, voice cloning, voice design, image generation and image editing matching the SKILL.md description — network calls go only to the stated MiniMax API base URLs. However, the registry metadata lists no required environment variables while the SKILL.md and the code both require MINIMAX_API_KEY (and optionally MINIMAX_REGION). That metadata omission is inconsistent and should be corrected.

ℹ Instruction Scope

Runtime instructions and code stay within the stated purpose (calling remote APIs and saving returned media). Important operational behavior: voice cloning will upload local audio files to the remote /files endpoint, and image generation may download URLs returned by the API. These actions transmit user data to the provider and can consume Token Plan credits — the instructions do not ask the agent to read unrelated system files or other credentials.

⚠ Install Mechanism

This is an instruction-only skill with bundled Python scripts and no install spec. The scripts import the 'requests' library but the skill does not declare that dependency or provide an install step; that mismatch may lead to runtime failures or hidden additional setup. No external download URLs are used by the installer, which is lower risk, but the missing dependency declaration is an omission.

⚠ Credentials

The code legitimately requires MINIMAX_API_KEY (and optionally MINIMAX_REGION), and the SKILL.md documents these. However, the skill registry metadata lists no required env vars/primary credential. The requested environment access (an API key that can consume billing credits) is proportionate to the feature set, but the metadata mismatch is a packaging/visibility problem that could cause accidental exposures or misuse of credentials.

✓ Persistence & Privilege

The skill does not request 'always: true' and does not modify other skills or system settings. It does not persist credentials itself; it reads MINIMAX_API_KEY from environment as expected. Autonomous invocation is allowed (platform default) but not combined with other high-risk indicators here.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install minimax-speech-image
安装完成后，直接呼叫该 Skill 的名称或使用 /minimax-speech-image 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

Added MINIMAX_API_KEY environment variable documentation. Fixed voice cloning API flow. Improved image response parsing.

v1.0.0

- Initial release of MiniMax multimodal skill using Token Plan. - Supports speech synthesis (TTS), voice cloning, and voice design via speech-2.8-hd model. - Enables image generation (text-to-image and image-to-image) using image-01 model. - Includes command-line and Python API usage for all features. - Requires MiniMax API key and region settings. - Comprehensive documentation with voice and image module instructions.

元数据

Slug minimax-speech-image

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

MiniMax Multimodal (Speech + Image) 是什么？

MiniMax 多模态技能 — 接入 MiniMax Token Plan 接口，语音合成（TTS/音色克隆/音色设计）和图片生成（文生图/图生图）。使用 speech-2.8-hd（语音）和 image-01（图像）模型，消费 Token Plan 额度。当用户提到语音合成、音色克隆、图片生成、文生图、图生... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 116 次。

如何安装 MiniMax Multimodal (Speech + Image)？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install minimax-speech-image」即可一键安装，无需额外配置。

MiniMax Multimodal (Speech + Image) 是免费的吗？

是的，MiniMax Multimodal (Speech + Image) 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

MiniMax Multimodal (Speech + Image) 支持哪些平台？

MiniMax Multimodal (Speech + Image) 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 MiniMax Multimodal (Speech + Image)？

由 percivalee（@percivalee）开发并维护，当前版本 v1.0.1。