← Back to Skills Marketplace

text-to-speech

Name: text-to-speech
Author: lnj22

by lnj22 · GitHub ↗ · v0.1.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install multilingual-video-dubbing-text-to-speech

Description

Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs.

README (SKILL.md)

SKILL: TTS Audio Mastering

This skill focuses on producing clean, consistent, and delivery-ready TTS audio for video tasks. It covers speech cleanup, loudness normalization, segment boundaries, and export specs.

1. TTS Engine & Output Basics

Choose a TTS engine based on deployment constraints and quality needs:

Neural offline (e.g., Kokoro): stable, high quality, no network dependency.
Cloud TTS (e.g., Edge-TTS / OpenAI TTS): convenient, higher naturalness but network-dependent.
Formant TTS (e.g., espeak-ng): for prototyping only; often less natural.

Key rule: Always confirm the native sample rate of the generated audio before resampling for video delivery.

2. Speech Cleanup (Per Segment)

Apply lightweight processing to avoid common artifacts:

Rumble/DC removal: high-pass filter around 20 Hz
Harshness control: optional low-pass around 16 kHz (helps remove digital fizz)
Click/pop prevention: short fades at boundaries (e.g., 50 ms fade-in and fade-out)

Recommended FFmpeg pattern (example):

Add filters in a single chain, and keep them consistent across segments.

3. Loudness Normalization

Target loudness depends on the benchmark/task spec. A common target is ITU-R BS.1770 loudness measurement:

Integrated loudness: -23 LUFS
True peak: around -1.5 dBTP
LRA: around 11 (optional)

Recommended workflow:

Measure loudness using FFmpeg ebur128 (or equivalent meter).
Apply normalization (e.g., loudnorm) as the final step after cleanup and timing edits.
If you adjust tempo/duration after normalization, re-normalize again.

4. Timing & Segment Boundary Handling

When stitching segment-level TTS into a full track:

Match each segment to its target window as closely as possible.
If a segment is shorter than its window, pad with silence.
If a segment is longer, use gentle duration control (small speed change) or truncate carefully.
Always apply boundary fades after padding/trimming to avoid clicks.

Sync guideline: keep end-to-end drift small (e.g., \x3C= 0.2s) unless the task states otherwise.

Usage Guidance

This is a documentation-only skill describing best practices for TTS audio mastering. Before installing or using it, verify that your environment has the tools the guide references (e.g., FFmpeg with ebur128/loudnorm support, any chosen TTS engine). If you plan to use cloud TTS, be prepared to supply API keys via your normal configuration — the skill does not request or store credentials. Also confirm any licensing or usage limits of the TTS engine you pick. If you need the skill to be executable end-to-end, ask the author to list required binaries and sample FFmpeg commands explicitly so the agent won't fail due to missing tools.

Capability Analysis

Type: OpenClaw Skill Name: multilingual-video-dubbing-text-to-speech Version: 0.1.0 The skill bundle contains purely instructional documentation (SKILL.md) regarding audio mastering workflows for Text-to-Speech (TTS) tasks. It provides technical guidelines for using FFmpeg filters, loudness normalization (LUFS), and timing adjustments without any executable code, network requests, or suspicious instructions.

Capability Assessment

ℹ Purpose & Capability

The name and description match the SKILL.md content: both focus on cleanup, loudness normalization, timing, and delivery for TTS audio. The only discrepancy is that the instructions assume use of external tools (FFmpeg, loudness meters, various TTS engines) but the skill's metadata declares no required binaries or environment variables — this is a documentation omission rather than functional mismatch.

✓ Instruction Scope

The instructions stay on-topic: they describe audio filters, loudness targets, segment timing, and workflow steps. They do not direct the agent to read unrelated files, exfiltrate data, or perform system-wide configuration changes. They reference external TTS engines and FFmpeg usage but do not include open-ended or vague commands that would grant broad discretion.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no code files, so there is no installation footprint or network downloads to evaluate.

ℹ Credentials

The skill requests no environment variables or credentials, which is appropriate for a standalone guidance document. It mentions cloud TTS providers (e.g., OpenAI TTS, Edge-TTS) which, if used, would require credentials; the SKILL.md does not instruct how to supply or store those credentials. Users should expect to provide any needed API keys via their normal environment/config if they use cloud services.

✓ Persistence & Privilege

The skill does not request persistent presence (always is false) and does not ask to modify system or other skills' configuration. Autonomous invocation is allowed by default on the platform but this skill's scope and lack of credentials mitigate that concern.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install multilingual-video-dubbing-text-to-speech
After installation, invoke the skill by name or use /multilingual-video-dubbing-text-to-speech
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.1.0

Bulk publish from all-task-skills-dedup

Metadata

Slug multilingual-video-dubbing-text-to-speech

Version 0.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is text-to-speech?

Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs. It is an AI Agent Skill for Claude Code / OpenClaw, with 70 downloads so far.

How do I install text-to-speech?

Run "/install multilingual-video-dubbing-text-to-speech" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is text-to-speech free?

Yes, text-to-speech is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does text-to-speech support?

text-to-speech is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created text-to-speech?

It is built and maintained by lnj22 (@lnj22); the current version is v0.1.0.

More Skills