Xiaomi-any2speech

Name: Xiaomi-any2speech
Author: whiteshirt0429

功能描述

声音世界模型（Speech World Model）：不只是 TTS，而是理解场景、角色、情绪并自主规划表达的语音大模型。原生支持长文+多人、中英双语，也支持上传参考音频进行音色克隆（Voice Prompt / voice cloning），内置高能创作模板，将任意内容转为播客/有声书/相声/Rap/广播剧等...

安全使用建议

This skill appears to implement the claimed TTS and voice-clone flows, but before installing: 1) Verify the endpoint and vendor: there is no source code repo or homepage and the skill uses 'Xiaomi' naming—confirm this is an official or trusted mirror. 2) Be cautious with audio uploads: do not upload recordings you don't own or that contain other people's voices without explicit consent. 3) Prefer to set your own API_KEY (via environment) rather than relying on the embedded public key to retain control and traceability. 4) Follow the SKILL.md rule: only allow the agent to access file paths you explicitly provide; never permit directory scanning or reading of sensitive files. 5) Test the skill with non-sensitive, synthetic audio first and review any privacy/terms-of-service information from the endpoint operator. If you cannot verify the endpoint's authenticity, treat the skill as untrusted and avoid uploading sensitive audio.

功能分析

Type: OpenClaw Skill Name: xiaomi-any2speech-beyondtts Version: 1.0.7 The skill provides a comprehensive interface for Xiaomi's Any2Speech TTS and voice cloning API. It uses standard system utilities (curl, python3, ffmpeg) to interact with a legitimate Xiaomi domain (miplus-tts-public.ai.xiaomi.com) and includes explicit security instructions in SKILL.md that forbid the agent from accessing sensitive local files like ~/.ssh or ~/.env. The logic is transparent, aligned with the stated purpose, and incorporates safety guardrails for file handling and API interactions.

能力评估

ℹ Purpose & Capability

The SKILL.md describes a TTS / voice-cloning service and all runtime instructions (curl + python3, optional ffmpeg) are consistent with that purpose. However, the registry entry provides no source or homepage while using Xiaomi naming and a public 'free' API key; this creates an authenticity/branding mismatch that should be verified before trusting the skill.

✓ Instruction Scope

Instructions are narrowly scoped to constructing HTTP requests to the declared BASE endpoint, parsing JSON, and saving WAV outputs. The doc explicitly instructs the agent not to scan or read arbitrary system paths and to only use file paths explicitly provided by the user. The main remaining runtime risk is the expected behavior of uploading user-supplied audio to an external service (privacy/consent concern), which the skill inherently requires.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no code files—no binaries or archives are downloaded or written to disk by the skill itself, which minimizes install-time risk.

ℹ Credentials

The skill declares no required environment variables or credentials and defaults to a built-in public API_KEY (sk-anytospeech-pub-free). While this matches the described public/free usage, relying on an embedded public key and an undocumented external endpoint can have rate/abuse/traceability issues; consider overriding with a vetted credential if available. No unrelated secrets are requested.

✓ Persistence & Privilege

The skill does not request always:true or any elevated/always-present privilege. It is user-invocable and allows autonomous invocation (platform default) but does not declare persistent modifications to other skills or agent configs.

版本历史

v1.0.7

**Expanded input options and Voice Prompt/voice cloning improvements** - 新增支持网页链接（URL）输入及模板合成，无需额外文本。 - 参考音色（Voice Prompt）支持“用模板”与多角色映射，行为更可控（单/多人场景推荐两步式）。 - 参考音色降噪默认关闭（可显式开启），详细说明各参数和接口用法。 - 更丰富的触发意图：可通过“用模板”、“选模板”等关键词激活模板功能。 - 细化输入安全策略与说明，提升使用灵活性和易用性。

v1.0.6

**Summary: Adds support for voice cloning (Voice Prompt, VP) and Python-based JSON parsing.** - 新增参考音频克隆（Voice Prompt, VP）功能：可通过用户音色文件合成指定声音，多角色多文件，对应详细 VP 一步/两步流程与参数说明 - 支持 VP 专用接口（/v1/audio/vp/generate, /think, /synthesize, /jobs等）和降噪控制 - 增加 "用我的声音/克隆音色/参考这段录音" 场景自动检测与处理 - curl JSON 解析工具从 jq 改为 python3，提高兼容性 - skill 说明更精简，接口与模板信息补充更多示例与细节 - 飞书语音发送部分同步适配 python3 JSON 解析

v1.0.5

- 全面优化说明文档，提升易用性和表达清晰度。 - 支持文本超过1000字、大文件输入自动切换异步接口，异步最大轮询时长10分钟。 - 补充API使用频率限制说明并新增429错误处理建议。 - 强化输入安全、错误处理和环境变量配置说明。 - 明确输出文件说明、格式兼容、典型场景模板和自定义能力指导。 - 英文说明内容显著扩充，中英文模板分列更易查找。

v1.0.4

- Adds native support for both Chinese and English speech synthesis, including mixed-language audio in the same instruction. - Updates all usage instructions, examples, and templates to cover English and bilingual (中英混合) scenarios. - Now recognizes English intent phrases such as "make a podcast," "read aloud," or "text to speech" for triggering the skill. - Documentation now includes clear guidance for using English or mixed-language, plus English template examples for various scenarios. - No changes to code or logic; this version is a major documentation and intent/usage expansion.

v1.0.3

- 更新介绍，明确采用原生长文本/多人合成的语音大模型（长序列与多说话人直接生成） - 说明无需拼接或级联，单次推理支持最长约 10 分钟连贯音频 - 增补 ListenHub 等典型应用场景举例 - 强化对环境变量、飞书凭据安全性的说明 - 内容结构和说明更聚焦模型原理及使用注意事项，功能流程未变

v1.0.2

- 增加了文件路径安全限制，仅接受用户明确提供的路径，并规避敏感文件读取风险。 - 飞书凭据获取流程明确，缺失时将提示用户自行设置环境变量，不再尝试从本地文件读取。 - 运行依赖和环境变量需求在描述中写明，便于用户理解必需和可选依赖。 - 细化了发送前文件名确认和敏感路径排除流程，提升交互安全性。 - 其他说明和前置条件细化，改善用户操作指引与安全性。

v1.0.1

**Expanded and clarified Any2Speech capabilities; stricter error and Feishu handling.** - 明确描述 skill 支持的多种 TTS 和长文语音、播客等高级能力，统一入口，提升可用性。 - 详细拆解 instruction 字段的能力类型及模板，强调 Instruct TTS 等可控合成。 - Feishu 发送要求用户显式提供凭据，不再从本地文件读取，增强安全性与透明度。 - 错误自动重试从“无限重试”收紧为“最多 1 次”，全部失败时直接告知用户。 - 环境变量和依赖工具说明更全，并细化每步输入条件和应答逻辑。 - 代码和文档结构优化，更清晰易用。

v1.0.0

Xiaomi-any2speech v1.0.0 - Initial release. - Converts arbitrary text, files, audio, or video into up to 10-minute single-speaker or styled speech (WAV output). - Supports multiple content types, including txt, md, pdf, docx, csv, json, html, and common audio/video formats. - Offers programmatic style selection (e.g. podcast, debate, news, rap, talk show) with customizable instructions. - Handles both synchronous and asynchronous API modes with automatic error retry and fallback logic. - Optional Feishu (Lark) voice message sending when requested by the user.

元数据

Slug xiaomi-any2speech-beyondtts

版本 1.0.7

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 8

常见问题

Xiaomi-any2speech 是什么？

声音世界模型（Speech World Model）：不只是 TTS，而是理解场景、角色、情绪并自主规划表达的语音大模型。原生支持长文+多人、中英双语，也支持上传参考音频进行音色克隆（Voice Prompt / voice cloning），内置高能创作模板，将任意内容转为播客/有声书/相声/Rap/广播剧等... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 566 次。

如何安装 Xiaomi-any2speech？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install xiaomi-any2speech-beyondtts」即可一键安装，无需额外配置。

Xiaomi-any2speech 是免费的吗？

是的，Xiaomi-any2speech 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Xiaomi-any2speech 支持哪些平台？

Xiaomi-any2speech 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Xiaomi-any2speech？

由 Di Wu（@whiteshirt0429）开发并维护，当前版本 v1.0.7。

Xiaomi-any2speech 是什么？

如何安装 Xiaomi-any2speech？

Xiaomi-any2speech 是免费的吗？

Xiaomi-any2speech 支持哪些平台？

谁开发了 Xiaomi-any2speech？

💬 留言讨论