← 返回 Skills 市场

text-to-speech

Name: text-to-speech
Author: lnj22

作者 lnj22 · GitHub ↗ · v0.1.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install multilingual-video-dubbing-text-to-speech

功能描述

Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs.

使用说明 (SKILL.md)

SKILL: TTS Audio Mastering

This skill focuses on producing clean, consistent, and delivery-ready TTS audio for video tasks. It covers speech cleanup, loudness normalization, segment boundaries, and export specs.

1. TTS Engine & Output Basics

Choose a TTS engine based on deployment constraints and quality needs:

Neural offline (e.g., Kokoro): stable, high quality, no network dependency.
Cloud TTS (e.g., Edge-TTS / OpenAI TTS): convenient, higher naturalness but network-dependent.
Formant TTS (e.g., espeak-ng): for prototyping only; often less natural.

Key rule: Always confirm the native sample rate of the generated audio before resampling for video delivery.

2. Speech Cleanup (Per Segment)

Apply lightweight processing to avoid common artifacts:

Rumble/DC removal: high-pass filter around 20 Hz
Harshness control: optional low-pass around 16 kHz (helps remove digital fizz)
Click/pop prevention: short fades at boundaries (e.g., 50 ms fade-in and fade-out)

Recommended FFmpeg pattern (example):

Add filters in a single chain, and keep them consistent across segments.

3. Loudness Normalization

Target loudness depends on the benchmark/task spec. A common target is ITU-R BS.1770 loudness measurement:

Integrated loudness: -23 LUFS
True peak: around -1.5 dBTP
LRA: around 11 (optional)

Recommended workflow:

Measure loudness using FFmpeg ebur128 (or equivalent meter).
Apply normalization (e.g., loudnorm) as the final step after cleanup and timing edits.
If you adjust tempo/duration after normalization, re-normalize again.

4. Timing & Segment Boundary Handling

When stitching segment-level TTS into a full track:

Match each segment to its target window as closely as possible.
If a segment is shorter than its window, pad with silence.
If a segment is longer, use gentle duration control (small speed change) or truncate carefully.
Always apply boundary fades after padding/trimming to avoid clicks.

Sync guideline: keep end-to-end drift small (e.g., \x3C= 0.2s) unless the task states otherwise.

安全使用建议

This is a documentation-only skill describing best practices for TTS audio mastering. Before installing or using it, verify that your environment has the tools the guide references (e.g., FFmpeg with ebur128/loudnorm support, any chosen TTS engine). If you plan to use cloud TTS, be prepared to supply API keys via your normal configuration — the skill does not request or store credentials. Also confirm any licensing or usage limits of the TTS engine you pick. If you need the skill to be executable end-to-end, ask the author to list required binaries and sample FFmpeg commands explicitly so the agent won't fail due to missing tools.

功能分析

Type: OpenClaw Skill Name: multilingual-video-dubbing-text-to-speech Version: 0.1.0 The skill bundle contains purely instructional documentation (SKILL.md) regarding audio mastering workflows for Text-to-Speech (TTS) tasks. It provides technical guidelines for using FFmpeg filters, loudness normalization (LUFS), and timing adjustments without any executable code, network requests, or suspicious instructions.

能力评估

ℹ Purpose & Capability

The name and description match the SKILL.md content: both focus on cleanup, loudness normalization, timing, and delivery for TTS audio. The only discrepancy is that the instructions assume use of external tools (FFmpeg, loudness meters, various TTS engines) but the skill's metadata declares no required binaries or environment variables — this is a documentation omission rather than functional mismatch.

✓ Instruction Scope

The instructions stay on-topic: they describe audio filters, loudness targets, segment timing, and workflow steps. They do not direct the agent to read unrelated files, exfiltrate data, or perform system-wide configuration changes. They reference external TTS engines and FFmpeg usage but do not include open-ended or vague commands that would grant broad discretion.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no code files, so there is no installation footprint or network downloads to evaluate.

ℹ Credentials

The skill requests no environment variables or credentials, which is appropriate for a standalone guidance document. It mentions cloud TTS providers (e.g., OpenAI TTS, Edge-TTS) which, if used, would require credentials; the SKILL.md does not instruct how to supply or store those credentials. Users should expect to provide any needed API keys via their normal environment/config if they use cloud services.

✓ Persistence & Privilege

The skill does not request persistent presence (always is false) and does not ask to modify system or other skills' configuration. Autonomous invocation is allowed by default on the platform but this skill's scope and lack of credentials mitigate that concern.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install multilingual-video-dubbing-text-to-speech
安装完成后，直接呼叫该 Skill 的名称或使用 /multilingual-video-dubbing-text-to-speech 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.0

Bulk publish from all-task-skills-dedup

元数据

Slug multilingual-video-dubbing-text-to-speech

版本 0.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

text-to-speech 是什么？

Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 70 次。

如何安装 text-to-speech？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install multilingual-video-dubbing-text-to-speech」即可一键安装，无需额外配置。

text-to-speech 是免费的吗？

是的，text-to-speech 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

text-to-speech 支持哪些平台？

text-to-speech 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 text-to-speech？

由 lnj22（@lnj22）开发并维护，当前版本 v0.1.0。