← 返回 Skills 市场

feishu-minimax-t2a-voice

Name: feishu-minimax-t2a-voice
Author: michelangelo-in-sistine

作者 habitum · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

123

总下载

当前安装

版本数

在 OpenClaw 中安装

/install feishu-minimax-t2a-voice

功能描述

飞书语音消息收发：接收语音自动转文字（飞书原生 Transcript + Whisper 降级），回复语音由 MiniMax T2A 合成后发送。

使用说明 (SKILL.md)

feishu-voice

接收：语音 → 文字

飞书自动为语音消息生成转写，消息体中自带 Transcript 字段，直接读取即可，无需任何 API 调用。

发送：文字 → 语音

流程

Step 1. 调用脚本生成语音文件：

python scripts/reply.py "\x3C文字内容>"

输出文件路径（格式为 .opus 或 .ogg）。

Step 2. 通过飞书发送语音：

message(action=send, channel=feishu, media=\x3Cfilepath>, contentType="audio/opus")

注：Edge TTS 输出的 .ogg 文件同样使用 audio/opus contentType。

MiniMax 语气词（配置了 MINIMAX_API_KEY 时）

在生成回复文本时主动嵌入以下标记，可让语音更自然：

标记	含义	使用场景
`\x3C#0.3#>`	停顿 0.3 秒	逗号后、句子中间
`(breath)`	自然呼吸	长句中间、句末
`(sighs)`	叹气	感叹、无奈时
`(emm)`	思考语气	问句结尾、停顿后继续
`(clear-throat)`	清嗓	转折、开始说话
`(laughs)`	笑声	开心、幽默内容
`(chuckle)`	轻笑	轻松调侃
`(sniffs)`	吸鼻子	轻微情绪
`(humming)`	哼唱	愉快、自言自语

规则：

标记插入两个有发音文本之间，不可连续叠加
问句句尾加 (emm)
感叹句插 (laughs) 或 (sighs)
句号前无自然停顿时加 (breath)
长叙述每隔 20-30 字符插一次 (breath) 或 \x3C#0.3#>

示例：

模型生成文本：好的，那我们出发吧。
应生成：好的\x3C#0.3#>，那我们出发吧(laughs)。

模型生成文本：等等，让我想想，这个怎么做来着？
应生成：等等\x3C#0.3#>，让我想想(emm)\x3C#0.4#>，这个怎么做来着？

模型生成文本：唉，今天真是太累了。
应生成：唉(sighs)，今天真是太累了(breath)。

链路降级

MiniMax T2A (mp3) → ffmpeg → opus  [优先]
    ↓ 超时/无 key
Edge TTS (ogg 直出)                  [降级]
    ↓ 失败
返回纯文字（不走语音）

环境变量

变量	必填	说明
`MINIMAX_API_KEY`	否	有则优先 MiniMax；无则 Edge TTS
`EDGE_TTS_VOICE`	否	Edge TTS 音色，默认 `zh-CN-XiaoxiaoNeural`

快速参考

# 生成语音并发送
python scripts/reply.py "\x3C文字>"  →  输出文件路径  →  message(media=路径, contentType="audio/opus")

安全使用建议

This skill appears to do what it claims (generate Feishu voice replies via MiniMax with Edge TTS fallback), but there are practical and transparency problems you should address before installing or enabling it: - Missing dependency/install info: The code requires Python packages (requests, edge_tts) and optionally ffmpeg, but no install spec (pip requirements or instructions) is provided. Ensure you install these dependencies in a controlled environment or ask the author for a requirements.txt or installation instructions. - Optional API key: MINIMAX_API_KEY is optional and used to call https://api.minimaxi.com/v1/t2a_v2. Only set this if you trust that service and the skill origin, and avoid reusing sensitive credentials. - Hard-coded filesystem path: reply.py copies outputs to e:\Profile\Mac\.openclaw\media\out (Windows-style). This is odd and may create files in unexpected places or fail on non-Windows systems. Consider changing the scripts to use a configurable path or a platform-agnostic location (e.g., a tempdir or the agent's media directory). - Network behavior: The skill makes outbound HTTP requests to the MiniMax API and uses edge_tts (which opens network connections). If you need strict outbound controls, run it in a sandboxed environment or inspect traffic. - Audit or sandbox before use: If you do not fully trust the source, run the scripts in an isolated/container environment, review or rewrite the filesystem paths, and validate the external endpoints and payloads. Ask the publisher for a clear requirements/install section (pip packages, ffmpeg requirement) and for clarification on the hard-coded path; resolving these would move this assessment toward 'benign'.

功能分析

Type: OpenClaw Skill Name: feishu-minimax-t2a-voice Version: 1.0.1 The skill bundle provides Text-to-Speech (TTS) capabilities for Feishu using MiniMax and Edge TTS engines. The code in `scripts/send_voice.py` and `scripts/reply.py` correctly implements API calls to legitimate endpoints (api.minimaxi.com) and uses the `edge_tts` library for fallback. While there is a hardcoded Windows-style path (`e:\Profile\Mac\.openclaw\media\out`) used for output, it appears to be a development artifact rather than a malicious indicator. The instructions in `SKILL.md` are focused on formatting output for the TTS engine and do not contain prompt injection attacks or unauthorized commands.

能力评估

⚠ Purpose & Capability

The code and SKILL.md implement Feishu text→voice and voice→text behavior as described and call an external MiniMax API and Edge TTS. However, the package metadata/register fields claim no required binaries/env-vars while the SKILL.md and code expect ffmpeg (for MiniMax path), requests and edge_tts Python packages, and an optional MINIMAX_API_KEY. Also the scripts write output to a hard-coded path (e:\Profile\Mac\.openclaw\media\out), which is unexpected and platform-specific.

✓ Instruction Scope

Runtime instructions are narrow: run reply.py to produce an audio file and then send that file via the Feishu message tool. The scripts do not attempt to read arbitrary user files or other credentials; they only use environment variables documented in SKILL.md (MINIMAX_API_KEY, EDGE_TTS_VOICE). They do, however, copy generated audio into a hard-coded filesystem location which is unusual and may be surprising.

⚠ Install Mechanism

There is no install spec yet the code imports third-party Python libraries (requests, edge_tts) and expects ffmpeg to be present for the preferred MiniMax path. The absence of declared dependencies or an install step is an inconsistency: the runtime will fail or behave differently depending on the environment. No external download URLs are present, but the missing dependency declarations are a practical installation risk.

ℹ Credentials

The only credential-like item used is an optional MINIMAX_API_KEY (documented in SKILL.md) and an EDGE_TTS_VOICE setting. The registry metadata listed 'no required env vars', which is misleading but not dangerous. The key is optional and reasonable for calling MiniMax; no unrelated tokens or broad privileges are requested.

✓ Persistence & Privilege

The skill is not always-included and does not request elevated platform privileges. It writes generated media to disk (tempdir and additionally a hard-coded 'media/out' path). Writing files is within the scope of a TTS skill but the hard-coded, Windows-style destination is unexpected and could create side effects or fail silently on other OSes.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install feishu-minimax-t2a-voice
安装完成后，直接呼叫该 Skill 的名称或使用 /feishu-minimax-t2a-voice 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

Version 1.0.1 - Changelog - Added scripts/reply.py to support generating voice message replies. - Updated documentation to reflect simplified usage: now reply generation is `python scripts/reply.py "<文字内容>"`. - Documented support for both MiniMax T2A (mp3 → opus) and Edge TTS (ogg) as audio generation backends, with automatic fallback. - Explained how to insert MiniMax prosody tags and fillers to make synthesized speech more natural. - Revised environment variable requirements: MINIMAX_API_KEY is now optional; will use Edge TTS if unavailable. - Clarified sending method: produced `.opus` or `.ogg` should be sent as `audio/opus` via Feishu.

v1.0.0

- Initial release of feishu-voice skill. - Enables Feishu voice message send/receive: receives audio and auto-transcribes (native Transcript first, Whisper fallback), replies with MiniMax T2A voice synthesis. - Supports automatic conversion to required opus format using ffmpeg. - Requires MiniMax API key and binaries (whisper, ffmpeg). - Provides detailed usage, environment variable setup, and command reference in documentation. - Includes script for synthesis and conversion workflow.

元数据

Slug feishu-minimax-t2a-voice

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题