/install multilingual-video-dubbing-text-to-speech
SKILL: TTS Audio Mastering
This skill focuses on producing clean, consistent, and delivery-ready TTS audio for video tasks. It covers speech cleanup, loudness normalization, segment boundaries, and export specs.
1. TTS Engine & Output Basics
Choose a TTS engine based on deployment constraints and quality needs:
- Neural offline (e.g., Kokoro): stable, high quality, no network dependency.
- Cloud TTS (e.g., Edge-TTS / OpenAI TTS): convenient, higher naturalness but network-dependent.
- Formant TTS (e.g., espeak-ng): for prototyping only; often less natural.
Key rule: Always confirm the native sample rate of the generated audio before resampling for video delivery.
2. Speech Cleanup (Per Segment)
Apply lightweight processing to avoid common artifacts:
- Rumble/DC removal: high-pass filter around 20 Hz
- Harshness control: optional low-pass around 16 kHz (helps remove digital fizz)
- Click/pop prevention: short fades at boundaries (e.g., 50 ms fade-in and fade-out)
Recommended FFmpeg pattern (example):
- Add filters in a single chain, and keep them consistent across segments.
3. Loudness Normalization
Target loudness depends on the benchmark/task spec. A common target is ITU-R BS.1770 loudness measurement:
- Integrated loudness: -23 LUFS
- True peak: around -1.5 dBTP
- LRA: around 11 (optional)
Recommended workflow:
- Measure loudness using FFmpeg
ebur128(or equivalent meter). - Apply normalization (e.g.,
loudnorm) as the final step after cleanup and timing edits. - If you adjust tempo/duration after normalization, re-normalize again.
4. Timing & Segment Boundary Handling
When stitching segment-level TTS into a full track:
- Match each segment to its target window as closely as possible.
- If a segment is shorter than its window, pad with silence.
- If a segment is longer, use gentle duration control (small speed change) or truncate carefully.
- Always apply boundary fades after padding/trimming to avoid clicks.
Sync guideline: keep end-to-end drift small (e.g., \x3C= 0.2s) unless the task states otherwise.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install multilingual-video-dubbing-text-to-speech - 安装完成后,直接呼叫该 Skill 的名称或使用
/multilingual-video-dubbing-text-to-speech触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
text-to-speech 是什么?
Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 70 次。
如何安装 text-to-speech?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install multilingual-video-dubbing-text-to-speech」即可一键安装,无需额外配置。
text-to-speech 是免费的吗?
是的,text-to-speech 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
text-to-speech 支持哪些平台?
text-to-speech 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 text-to-speech?
由 lnj22(@lnj22)开发并维护,当前版本 v0.1.0。