← 返回 Skills 市场
irachex

Local TTS

作者 irachex · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
359
总下载
0
收藏
4
当前安装
1
版本数
在 OpenClaw 中安装
/install local-tts
功能描述
Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic...
使用说明 (SKILL.md)

Local TTS with Qwen3-TTS

Privacy-First | Offline | High-Quality | Natural Real Voices

Local text-to-speech synthesis using Qwen3-TTS models. Your text never leaves your machine.

Why Local TTS?

Unlike cloud TTS (Google, AWS, Azure), local-tts ensures:

  • Zero data transmission - 100% on-device processing
  • Works offline - No network required
  • No API keys - No external dependencies
  • GDPR/HIPAA friendly - Simplified compliance

See privacy & security details.

Platform Overview

Platform Backend Installation Best For
macOS (Apple Silicon) mlx_audio pip install mlx-audio M1/M2/M3/M4 Macs
Linux/Windows qwen-tts pip install qwen-tts CUDA GPUs

Quick Start

macOS

pip install mlx-audio
brew install ffmpeg

# Natural female voice
python -m mlx_audio.tts.generate \
    --text "Hello world" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Chelsie

Linux/Windows

pip install qwen-tts

# With optimizations (FlashAttention, bfloat16, auto-device)
python scripts/tts_linux.py "Hello world" --female

Key Concepts

--voice vs --instruct (Important)

Model --voice --instruct Notes
CustomVoice Select preset voice Add style/emotion Can use together - voice + style control
VoiceDesign N/A Create voice from description --instruct only
Base N/A N/A For voice cloning with --ref_audio

CustomVoice with style control:

python -m mlx_audio.tts.generate \
    --text "Hello there!" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Serena \
    --instruct "excited and enthusiastic"

9 Preset Voices (Open Source CustomVoice)

Voice Gender Language Character
Chelsie Female English (American) Gentle, empathetic
Serena Female English Warm, gentle
Ono Anna Female Japanese Playful
Sohee Female Korean Warm
Aiden Male English (American) Sunny
Dylan Male English Natural
Eric Male English Real
Ryan Male English Natural
Uncle Fu Male Chinese Youthful Beijing

Defaults: Female=Serena, Male=Aiden

Usage Examples

CustomVoice (Preset Voices)

# Natural female
python -m mlx_audio.tts.generate \
    --text "Your text" --voice Serena --lang_code en \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

# Real male
python -m mlx_audio.tts.generate \
    --text "Your text" --voice Aiden --lang_code en \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

VoiceDesign (Text-Based)

python -m mlx_audio.tts.generate \
    --text "Hello" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-8bit \
    --instruct "A warm female voice, professional and clear"

Long Text Generation

For long text, increase --max_tokens and enable --join_audio (macOS/MLX only):

python -m mlx_audio.tts.generate \
    --text "Your very long text here..." \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Serena \
    --max_tokens 4096 \
    --join_audio \
    --output long_audio.wav

Voice Cloning

python -m mlx_audio.tts.generate \
    --text "Cloned voice speaking" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \
    --ref_audio sample.wav --ref_text "Sample transcript"

Parameters

Parameter Description Values
--text Text to speak Required
--model Model ID See table below
--voice Preset voice (CustomVoice) Chelsie, Serena, Aiden, Ryan...
--instruct Voice description (VoiceDesign) or style/emotion (CustomVoice) e.g., "excited", "calm", "professional"
--speed Speaking rate 0.5-2.0 (default: 1.0)
--pitch Voice pitch 0.5-2.0 (default: 1.0)
--lang_code Language en, cn, ja, ko, de, fr...
--ref_audio Reference for cloning File path
--output Output file Path (auto-generated if omitted)
--max_tokens Max generation tokens Integer (default: 2048) - Increase for long text
--join_audio Merge audio segments true (default) or false - Recommended for long text

Models

Model Size Purpose
Qwen3-TTS-12Hz-1.7B-CustomVoice 1.7B 9 preset voices + style control
Qwen3-TTS-12Hz-1.7B-VoiceDesign 1.7B Text-based voice creation
Qwen3-TTS-12Hz-1.7B-Base 1.7B Voice cloning
Qwen3-TTS-12Hz-0.6B-* 0.6B Lightweight versions

macOS: Add mlx-community/ prefix (e.g., mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit)

Scripts

  • scripts/tts_macos.py - macOS wrapper
  • scripts/tts_linux.py - Linux/Windows wrapper with optimizations

Optimizations (Linux/Windows)

tts_linux.py automatically enables:

  • FlashAttention - Faster, less memory
  • bfloat16 - Better precision
  • Auto device - CUDA → CPU fallback
  • Mixed precision - Speed + quality

Troubleshooting

Issue Solution
macOS: Model not found Use mlx-community/ prefix
macOS: Audio format brew install ffmpeg
Linux: CUDA OOM Use 0.6B models
Linux: Slow Check CUDA: torch.cuda.is_available()

References

Version

1.0.0 - See VERSION and package.json

安全使用建议
This skill appears to do what it says: local, offline TTS wrappers for macOS (mlx_audio) and Linux/Windows (qwen-tts). Before installing or running it: - Expect large one-time downloads for model weights (from Hugging Face-style model IDs) and significant disk/GPU usage for 1.7B models — the docs note smaller 0.6B alternatives. - If you require strict air-gapped operation, pre-download and verify models and dependencies; the scripts will call from_pretrained which normally performs network fetches. - Some model checkpoints may be gated and require a Hugging Face token (not declared by the skill); provide such credentials yourself if needed and verify the trustworthiness of the model source. - The registry metadata had minor mismatches (homepage present in package.json but registry listed none) and tests reference a VERSION file not present in the manifest — these are build/metadata inconsistencies, not direct security red flags, but you may want to confirm the repository/author (package.json points to https://github.com/irachex/local-tts). - Dependencies to install (mlx-audio, qwen-tts, torch, ffmpeg, optional flash-attn) are normal for TTS but be prepared for native builds (flash-attn) and sizeable installs. If you want to be extra cautious, review the upstream GitHub repo and the actual model sources before running the first model download.
功能分析
Type: OpenClaw Skill Name: local-tts Version: 1.0.0 The skill bundle provides a legitimate local text-to-speech utility using Qwen3-TTS models for macOS, Linux, and Windows. The Python wrappers (scripts/tts_linux.py and scripts/tts_macos.py) use standard machine learning libraries (torch, transformers, mlx_audio) and implement safe subprocess handling without shell injection risks. The documentation (SKILL.md and references/privacy_security.md) is consistent with the stated purpose, focusing on privacy and offline processing, and contains no evidence of malicious prompt injection or hidden instructions.
能力评估
Purpose & Capability
Name/description (local Qwen3-TTS via mlx_audio or qwen-tts) match the included scripts and docs. The scripts call the expected libraries (mlx_audio on macOS, qwen-tts/torch on Linux/Windows) and expose the parameters described in SKILL.md. Minor metadata mismatch: registry metadata said homepage none but package.json includes a GitHub homepage, which is a non-security inconsistency in metadata.
Instruction Scope
SKILL.md and the scripts instruct only to run local TTS generation, reference local files (ref_audio) and standard parameters. They do not attempt to read arbitrary unrelated system files or exfiltrate data. The instructions do rely on downloading models (one-time) from model hosting (Hugging Face-style identifiers), which is documented in the README and references; that initial network activity is expected but should be acknowledged by users who require strict air-gapped operation.
Install Mechanism
No install spec in the registry (instruction-only), so nothing is auto-downloaded by the platform. The code relies on pip-installable packages (mlx-audio, qwen-tts, torch, flash-attn, ffmpeg) which are reasonable for this purpose. Model weights are loaded via from_pretrained calls which will fetch artifacts from model hosts; this is expected but can involve large downloads and possibly gated models that require credentials.
Credentials
The skill requests no environment variables, no credentials, and no config paths — consistent with a local-offline TTS tool. Caveat: some Hugging Face-hosted models can be gated and would require a HUGGINGFACE_TOKEN or equivalent at download time; the skill does not declare such env vars, so users should be aware to provide tokens manually if needed. Disk, memory and GPU resource requirements (large model files, VRAM) are documented in the references.
Persistence & Privilege
Skill does not request always:true or any elevated/persistent privileges. It is a user-invocable wrapper that runs local Python code and runs subprocesses to installed libraries; this is proportional to its function.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install local-tts
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /local-tts 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: Local text-to-speech with Qwen3-TTS, supporting macOS (mlx_audio) and Linux/Windows (qwen-tts) with FlashAttention, bfloat16 optimizations. 9 natural preset voices, voice cloning, and voice design.
元数据
Slug local-tts
版本 1.0.0
许可证 MIT-0
累计安装 4
当前安装数 4
历史版本数 1
常见问题

Local TTS 是什么?

Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 359 次。

如何安装 Local TTS?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-tts」即可一键安装,无需额外配置。

Local TTS 是免费的吗?

是的,Local TTS 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Local TTS 支持哪些平台?

Local TTS 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Local TTS?

由 irachex(@irachex)开发并维护,当前版本 v1.0.0。

💬 留言讨论