/install multilingual-video-dubbing-text-to-speech
SKILL: TTS Audio Mastering
This skill focuses on producing clean, consistent, and delivery-ready TTS audio for video tasks. It covers speech cleanup, loudness normalization, segment boundaries, and export specs.
1. TTS Engine & Output Basics
Choose a TTS engine based on deployment constraints and quality needs:
- Neural offline (e.g., Kokoro): stable, high quality, no network dependency.
- Cloud TTS (e.g., Edge-TTS / OpenAI TTS): convenient, higher naturalness but network-dependent.
- Formant TTS (e.g., espeak-ng): for prototyping only; often less natural.
Key rule: Always confirm the native sample rate of the generated audio before resampling for video delivery.
2. Speech Cleanup (Per Segment)
Apply lightweight processing to avoid common artifacts:
- Rumble/DC removal: high-pass filter around 20 Hz
- Harshness control: optional low-pass around 16 kHz (helps remove digital fizz)
- Click/pop prevention: short fades at boundaries (e.g., 50 ms fade-in and fade-out)
Recommended FFmpeg pattern (example):
- Add filters in a single chain, and keep them consistent across segments.
3. Loudness Normalization
Target loudness depends on the benchmark/task spec. A common target is ITU-R BS.1770 loudness measurement:
- Integrated loudness: -23 LUFS
- True peak: around -1.5 dBTP
- LRA: around 11 (optional)
Recommended workflow:
- Measure loudness using FFmpeg
ebur128(or equivalent meter). - Apply normalization (e.g.,
loudnorm) as the final step after cleanup and timing edits. - If you adjust tempo/duration after normalization, re-normalize again.
4. Timing & Segment Boundary Handling
When stitching segment-level TTS into a full track:
- Match each segment to its target window as closely as possible.
- If a segment is shorter than its window, pad with silence.
- If a segment is longer, use gentle duration control (small speed change) or truncate carefully.
- Always apply boundary fades after padding/trimming to avoid clicks.
Sync guideline: keep end-to-end drift small (e.g., \x3C= 0.2s) unless the task states otherwise.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install multilingual-video-dubbing-text-to-speech - After installation, invoke the skill by name or use
/multilingual-video-dubbing-text-to-speech - Provide required inputs per the skill's parameter spec and get structured output
What is text-to-speech?
Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs. It is an AI Agent Skill for Claude Code / OpenClaw, with 70 downloads so far.
How do I install text-to-speech?
Run "/install multilingual-video-dubbing-text-to-speech" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is text-to-speech free?
Yes, text-to-speech is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does text-to-speech support?
text-to-speech is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created text-to-speech?
It is built and maintained by lnj22 (@lnj22); the current version is v0.1.0.