← Back to Skills Marketplace
lnj22

text-to-speech

by lnj22 · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ✓ Security Clean
70
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install multilingual-video-dubbing-text-to-speech
Description
Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs.
README (SKILL.md)

SKILL: TTS Audio Mastering

This skill focuses on producing clean, consistent, and delivery-ready TTS audio for video tasks. It covers speech cleanup, loudness normalization, segment boundaries, and export specs.

1. TTS Engine & Output Basics

Choose a TTS engine based on deployment constraints and quality needs:

  • Neural offline (e.g., Kokoro): stable, high quality, no network dependency.
  • Cloud TTS (e.g., Edge-TTS / OpenAI TTS): convenient, higher naturalness but network-dependent.
  • Formant TTS (e.g., espeak-ng): for prototyping only; often less natural.

Key rule: Always confirm the native sample rate of the generated audio before resampling for video delivery.


2. Speech Cleanup (Per Segment)

Apply lightweight processing to avoid common artifacts:

  • Rumble/DC removal: high-pass filter around 20 Hz
  • Harshness control: optional low-pass around 16 kHz (helps remove digital fizz)
  • Click/pop prevention: short fades at boundaries (e.g., 50 ms fade-in and fade-out)

Recommended FFmpeg pattern (example):

  • Add filters in a single chain, and keep them consistent across segments.

3. Loudness Normalization

Target loudness depends on the benchmark/task spec. A common target is ITU-R BS.1770 loudness measurement:

  • Integrated loudness: -23 LUFS
  • True peak: around -1.5 dBTP
  • LRA: around 11 (optional)

Recommended workflow:

  1. Measure loudness using FFmpeg ebur128 (or equivalent meter).
  2. Apply normalization (e.g., loudnorm) as the final step after cleanup and timing edits.
  3. If you adjust tempo/duration after normalization, re-normalize again.

4. Timing & Segment Boundary Handling

When stitching segment-level TTS into a full track:

  • Match each segment to its target window as closely as possible.
  • If a segment is shorter than its window, pad with silence.
  • If a segment is longer, use gentle duration control (small speed change) or truncate carefully.
  • Always apply boundary fades after padding/trimming to avoid clicks.

Sync guideline: keep end-to-end drift small (e.g., \x3C= 0.2s) unless the task states otherwise.

Usage Guidance
This is a documentation-only skill describing best practices for TTS audio mastering. Before installing or using it, verify that your environment has the tools the guide references (e.g., FFmpeg with ebur128/loudnorm support, any chosen TTS engine). If you plan to use cloud TTS, be prepared to supply API keys via your normal configuration — the skill does not request or store credentials. Also confirm any licensing or usage limits of the TTS engine you pick. If you need the skill to be executable end-to-end, ask the author to list required binaries and sample FFmpeg commands explicitly so the agent won't fail due to missing tools.
Capability Analysis
Type: OpenClaw Skill Name: multilingual-video-dubbing-text-to-speech Version: 0.1.0 The skill bundle contains purely instructional documentation (SKILL.md) regarding audio mastering workflows for Text-to-Speech (TTS) tasks. It provides technical guidelines for using FFmpeg filters, loudness normalization (LUFS), and timing adjustments without any executable code, network requests, or suspicious instructions.
Capability Assessment
Purpose & Capability
The name and description match the SKILL.md content: both focus on cleanup, loudness normalization, timing, and delivery for TTS audio. The only discrepancy is that the instructions assume use of external tools (FFmpeg, loudness meters, various TTS engines) but the skill's metadata declares no required binaries or environment variables — this is a documentation omission rather than functional mismatch.
Instruction Scope
The instructions stay on-topic: they describe audio filters, loudness targets, segment timing, and workflow steps. They do not direct the agent to read unrelated files, exfiltrate data, or perform system-wide configuration changes. They reference external TTS engines and FFmpeg usage but do not include open-ended or vague commands that would grant broad discretion.
Install Mechanism
This is an instruction-only skill with no install spec and no code files, so there is no installation footprint or network downloads to evaluate.
Credentials
The skill requests no environment variables or credentials, which is appropriate for a standalone guidance document. It mentions cloud TTS providers (e.g., OpenAI TTS, Edge-TTS) which, if used, would require credentials; the SKILL.md does not instruct how to supply or store those credentials. Users should expect to provide any needed API keys via their normal environment/config if they use cloud services.
Persistence & Privilege
The skill does not request persistent presence (always is false) and does not ask to modify system or other skills' configuration. Autonomous invocation is allowed by default on the platform but this skill's scope and lack of credentials mitigate that concern.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install multilingual-video-dubbing-text-to-speech
  3. After installation, invoke the skill by name or use /multilingual-video-dubbing-text-to-speech
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Bulk publish from all-task-skills-dedup
Metadata
Slug multilingual-video-dubbing-text-to-speech
Version 0.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is text-to-speech?

Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs. It is an AI Agent Skill for Claude Code / OpenClaw, with 70 downloads so far.

How do I install text-to-speech?

Run "/install multilingual-video-dubbing-text-to-speech" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is text-to-speech free?

Yes, text-to-speech is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does text-to-speech support?

text-to-speech is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created text-to-speech?

It is built and maintained by lnj22 (@lnj22); the current version is v0.1.0.

💬 Comments