← 返回 Skills 市场

Supertonic TTS

Name: Supertonic TTS
Author: pratyushchauhan

作者 Pratyush Chauhan · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ pending

总下载

当前安装

版本数

在 OpenClaw 中安装

/install supertonic-tts

功能描述

On-device multilingual text-to-speech using Supertonic (Supertone). Use when the user needs local/offline TTS, voice generation, speech synthesis, or convert...

使用说明 (SKILL.md)

Supertonic TTS Skill

Local, multilingual text-to-speech powered by Supertone's Supertonic ONNX model.

Core Features

100% offline — No API key, no cloud, no network. Runs on-device via ONNX.
Tiny footprint — 66M–99M parameters. Runs on Pi, browser, e-reader, phone.
Stupid fast — Up to 167× real-time on consumer hardware. 4s of audio in ~25ms.
Studio output — 44.1kHz 16-bit mono WAV, no upsampler needed.
31 languages — Full multilingual support with lang="na" auto-detect fallback.
Voice cloning — Clone any voice via Voice Builder, deploy permanently offline.
Expression tags — Only \x3Claugh> is user-verified to produce audible expression. \x3Cbreath> and \x3Csigh> are weak/unconfirmed. All others fail silently.

Prerequisites

Requires the Python SDK and model assets. Install once:

pip install supertonic

First run auto-downloads ~400MB of ONNX models from Hugging Face into ~/.cache/supertonic3/.

Quick Use

Python SDK

from supertonic import TTS

tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")

wav, duration = tts.synthesize(
    text="Your text here",
    lang="en",           # language code or "na" for auto-detect
    voice_style=style,
    total_steps=8,       # quality: 5 (low) to 12 (high)
    speed=1.0,           # 0.7 (slow) to 2.0 (fast)
)

tts.save_audio(wav, "output.wav")

CLI (via supertonic package)

# Basic synthesis
supertonic tts "Hello world" -o output.wav

# Pick voice and quality
supertonic tts "Use a different voice." -o output.wav --voice F1 --steps 10

# Custom cloned voice
supertonic tts "Hello in my voice." -o output.wav --custom-style-path voices/my_voice.json

# Multilingual
supertonic tts "こんにちは" -o japanese.wav --lang ja
supertonic tts "Bonjour" -o french.wav --lang fr

Skill Scripts

cd ~/.openclaw/workspace/skills/supertonic-tts/scripts
source ~/.openclaw/workspace/.browser-use-venv/bin/activate

# Quick synthesis
python3 synthesize.py "Hello world" --voice M1 --output ~/hello.wav

# With expression tags (only \x3Claugh> is confirmed to work)
python3 synthesize.py "You did it \x3Claugh> I am so proud." --voice M5 --output laugh.wav

# Custom voice
python3 synthesize.py "Hello" --custom-style my_voice.json --output cloned.wav

# Japanese
python3 synthesize.py "こんにちは" --voice F3 --lang ja

# List voices
python3 list_voices.py

Voices

10 built-in voices: F1–F5 (female), M1–M5 (male).

Voice cloning: Record a short clip → upload to Voice Builder → export JSON → load with get_voice_style_from_path().

See references/voices.md for voice descriptions and Voice Builder workflow.

Expression Tags

⚠️ Mostly non-functional in practice

Supertonic accepts inline self-closing tags, but only \x3Claugh> has been user-verified to produce a clearly audible expression (laughter burst). \x3Cbreath> and \x3Csigh> may insert minor pauses but are not confirmed as audible breathing/sighing sounds.

Do not rely on tags for expression. Tested tags that failed to produce audible effect include: \x3Csarcastic>, \x3Cexcited>, \x3Cwhisper>, \x3Cshout>, \x3Chappy>, \x3Csad>, \x3Cangry>, \x3Cchuckle>, \x3Cgiggle>, \x3Csnort>, \x3Cgasp>, \x3Cgrunt>, \x3Ccough>, \x3Cscream>, \x3Csing>, \x3Ccry>, \x3Cyawn>, \x3Chmm>, \x3Caha>.

Correct syntax (self-closing, inline):

text = "You did it \x3Claugh> I am so proud."

Reliable alternative for emotion: explicit language + speed modulation:

Emotion	Technique
Happy	Upbeat words + `speed=1.1`
Sad	Subdued words + `speed=0.85`
Excited	Exclamations + `speed=1.15`
Urgent	Short imperatives + `speed=1.2`

See references/expression-tags.md for full testing results.

Parameters

Param	Range	Default	What It Does
`total_steps`	5–12	8	Quality vs speed tradeoff
`speed`	0.7–2.0	1.0	Speech rate multiplier
`max_chunk_length`	any	300	Break long text into chunks (120 for Korean)
`silence_duration`	any	0.3	Pause between chunks (seconds)
`lang`	ISO 639-1 or `"na"`	`"en"`	`"na"` = language-agnostic auto-detect
`verbose`	True/False	`False`	Show detailed progress

Languages

31 languages + na (language-agnostic auto-detect). See references/languages.md for all codes.

Output

Format: 44.1kHz 16-bit mono WAV
Returns: (wav_array, duration_array)
wav.shape = (1, num_samples)
duration[0] = length in seconds

Multi-Runtime Deployment

Supertonic runs across: Python, Node.js, Browser (WebGPU), Java, C++, C#, Go, Swift, iOS, Rust, Flutter.

Scripts

scripts/synthesize.py — CLI for quick text-to-speech (supports custom voices)
scripts/list_voices.py — Available voices and metadata

References

references/voices.md — Voice descriptions, selection guide, Voice Builder workflow
references/expression-tags.md — All tags, examples, caveats
references/languages.md — Supported language codes
references/deployment.md — Multi-runtime deployment options

能力标签

requires-sensitive-credentials

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install supertonic-tts
安装完成后，直接呼叫该 Skill 的名称或使用 /supertonic-tts 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release: local multilingual TTS using Supertonic ONNX models

元数据

Slug supertonic-tts

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Supertonic TTS 是什么？

On-device multilingual text-to-speech using Supertonic (Supertone). Use when the user needs local/offline TTS, voice generation, speech synthesis, or convert... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 92 次。

如何安装 Supertonic TTS？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install supertonic-tts」即可一键安装，无需额外配置。

Supertonic TTS 是免费的吗？

是的，Supertonic TTS 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Supertonic TTS 支持哪些平台？

Supertonic TTS 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Supertonic TTS？

由 Pratyush Chauhan（@pratyushchauhan）开发并维护，当前版本 v1.0.0。