Description

AI智能配音合成专家。将文案/脚本转换为高拟真语音音频，支持多种音色、情感控制、SSML标注和后期处理。触发场景：用户说"配音"、"语音合成"、"TTS"、"旁白"、"播客音频"、"有声读物"、"AI配音"、"朗读"、"音频生成"，或要求"用XX声音读这段文案"、"生成播客音频"、"把文章转成有声版"等。支...

README (SKILL.md)

智能配音合成虾 (ai-voice-synthesis-claw)

Name: 智能配音合成虾
Author: tujinsama

将文字转化为有温度的声音。

工作流程

步骤 1：理解需求

收集以下信息（未提供时使用默认值）：

文本内容：待配音的文案/脚本
音色风格：参考 references/voice-style-guide.md 选择合适音色
语速：slow / normal（默认）/ fast
情感：calm / warm / professional / energetic
输出格式：mp3（默认）/ wav

步骤 2：文本预处理

在调用 TTS 前对文本进行处理：

分句断句（按标点符号）
数字转中文（100 → 一百）
多音字标注（如"重要"的"重"）
添加停顿标记

步骤 3：选择 TTS 引擎

按优先级选择可用引擎：

ElevenLabs（推荐）：最自然，支持情感控制，需 ELEVENLABS_API_KEY
OpenAI TTS：质量稳定，需 OPENAI_API_KEY
Azure TTS：多语言支持，需 AZURE_SPEECH_KEY + AZURE_SPEECH_REGION
系统 TTS（兜底）：使用 tts 工具直接合成（无需 API key，质量较低）

检查环境变量确认可用引擎：

echo "ElevenLabs: $ELEVENLABS_API_KEY" && echo "OpenAI: $OPENAI_API_KEY"

步骤 4：生成 SSML（可选，精细控制时使用）

参考 references/ssml-guide.md 为文本添加 SSML 标注。简单场景可跳过，直接传纯文本。

步骤 5：调用合成脚本

# 单段文本合成
python3 scripts/synthesize-voice.py \
  --text "你好，欢迎收听本期节目" \
  --voice warm-female \
  --speed normal \
  --output ./output.mp3

# 从文件合成
python3 scripts/synthesize-voice.py \
  --script ./script.txt \
  --voice professional-male \
  --speed fast \
  --output ./output.mp3

# 添加背景音乐
python3 scripts/synthesize-voice.py \
  --script ./script.txt \
  --bgm ./bgm/light-jazz.mp3 \
  --bgm-volume 0.1 \
  --output ./output.mp3

步骤 6：后期处理

参考 references/audio-processing-guide.md，脚本自动完成：

降噪处理
音量标准化（-14 LUFS）
背景音乐混音（可选）
格式转换

步骤 7：交付

将生成的音频文件发送给用户：

合成完成！这是你的配音文件。
MEDIA:./output.mp3

音色快速参考

场景	推荐音色
知识科普	professional-male / professional-female
情感故事	warm-female
商业广告	magnetic-male
轻松娱乐	young-energetic

详细音色库见 references/voice-style-guide.md。

环境依赖

pip install elevenlabs openai pydub requests
brew install ffmpeg  # macOS

注意事项

单次合成建议不超过 10 分钟音频
音色克隆需至少 1 分钟清晰样本音频
使用他人声音克隆需获得授权
无 API key 时降级使用系统 tts 工具

Usage Guidance

This skill appears to implement TTS via ElevenLabs and OpenAI, but there are a few red flags you should consider before installing or supplying API keys: - Metadata vs. reality: The registry metadata lists no required environment variables, but the included script requires ELEVENLABS_API_KEY and OPENAI_API_KEY. Confirm with the author or expect to provide those keys. - Azure mismatch: SKILL.md mentions Azure credentials, but the script does not implement Azure TTS — ask the maintainer for clarification if you need Azure support. - Secret exposure: SKILL.md shows examples that echo environment variables (e.g., echo "ElevenLabs: $ELEVENLABS_API_KEY"). Avoid executing such commands in shared or logged environments since they may expose your API keys in logs. Instead, verify keys privately or use secure tooling to manage secrets. - Dependency installation: The instructions tell you to pip install packages and brew install ffmpeg. Only install these in a trusted/isolated environment (virtualenv/container) to limit risk. - Voice cloning / copyright: The skill notes voice cloning requires authorization. Do not pass audio samples or use someone else's voice without consent. Suggested actions before use: inspect the code (you already have synthesize-voice.py), run it in an isolated environment, provide API keys with least-privilege credentials or test keys, and request the publisher update the skill metadata to list required env vars and clarify Azure support.

Capability Analysis

Type: OpenClaw Skill Name: ai-voice-synthesis-claw Version: 1.0.0 The skill bundle contains an instruction in SKILL.md (Step 3) that directs the AI agent to execute a shell command to 'echo' sensitive environment variables (ELEVENLABS_API_KEY and OPENAI_API_KEY). This behavior risks exposing private API credentials in the agent's output logs or to the end-user. While the Python script 'scripts/synthesize-voice.py' appears to be a legitimate implementation of voice synthesis using ElevenLabs and OpenAI APIs, the explicit instruction to print secrets is a high-risk vulnerability often used for credential harvesting.

Capability Tags

requires-sensitive-credentials

Capability Assessment

⚠ Purpose & Capability

The skill's stated purpose is text→TTS using ElevenLabs/OpenAI/Azure/system TTS, which matches the included synthesize-voice.py for ElevenLabs and OpenAI; however the registry metadata declares no required env vars or credentials while both SKILL.md and the script expect ELEVENLABS_API_KEY and OPENAI_API_KEY (SKILL.md also lists AZURE_SPEECH_KEY and region but the script does not implement Azure). This mismatch between claimed requirements and actual code is incoherent.

ℹ Instruction Scope

SKILL.md provides a clear TTS workflow and example commands that invoke scripts/synthesize-voice.py and post-processing. However the docs demonstrate running echo "ElevenLabs: $ELEVENLABS_API_KEY" which would print API keys to stdout/logs (a potential secret-leak risk). The instructions ask the agent to read script files and write output audio files (expected), and there are no instructions to exfiltrate data to unexpected endpoints. The guide suggests installing dependencies via pip/brew but there is no install spec in the metadata.

✓ Install Mechanism

There is no automated install spec (instruction-only plus a Python script). That is the lower-risk model because nothing is automatically downloaded or executed during install. The SKILL.md suggests pip/brew commands for dependencies, which is expected for a Python-based TTS script but will run arbitrary package installs if followed by a user.

⚠ Credentials

The package metadata declares no required environment variables, but the script reads ELEVENLABS_API_KEY and OPENAI_API_KEY from the environment and SKILL.md also references AZURE_SPEECH_* keys. Requiring API keys for the listed TTS services is reasonable, but the omission from metadata is inconsistent and the SKILL.md example of echoing env vars risks exposing secrets. There are no other unnecessary credentials requested.

✓ Persistence & Privilege

The skill does not request privileged persistence (always:false) and does not modify other skills or system-wide configs. It only runs as a normal user CLI script and writes generated audio files to the working directory.

Version History

v1.0.0

初始发布：支持 ElevenLabs/OpenAI TTS 多引擎配音，含音色库、SSML规范、后期处理指南

Metadata

Slug ai-voice-synthesis-claw

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is 智能配音合成虾?

AI智能配音合成专家。将文案/脚本转换为高拟真语音音频，支持多种音色、情感控制、SSML标注和后期处理。触发场景：用户说"配音"、"语音合成"、"TTS"、"旁白"、"播客音频"、"有声读物"、"AI配音"、"朗读"、"音频生成"，或要求"用XX声音读这段文案"、"生成播客音频"、"把文章转成有声版"等。支... It is an AI Agent Skill for Claude Code / OpenClaw, with 113 downloads so far.

How do I install 智能配音合成虾?

Run "/install ai-voice-synthesis-claw" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 智能配音合成虾 free?

Yes, 智能配音合成虾 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 智能配音合成虾 support?

智能配音合成虾 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 智能配音合成虾?

It is built and maintained by Ricky (@tujinsama); the current version is v1.0.0.

More Skills

智能配音合成虾