CosyVoice3 macOS
/install cosyvoice3-macos
CosyVoice3 TTS
Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon.
Overview
CosyVoice3 is an advanced TTS system based on large language models, supporting:
- 9 languages: Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian
- 18+ Chinese dialects: Cantonese, Sichuan, Dongbei, Shanghai, etc.
- Zero-shot voice cloning: Clone any voice from 3-10 seconds of audio
- Cross-lingual synthesis: Speak Chinese with English voice or vice versa
- Fine-grained control: Emotions, speed, volume via text tags
Prerequisites
- macOS with Apple Silicon (M1/M2/M3)
- Python 3.10
- Conda installed
- ~5GB disk space for models
Installation
Run the installation script:
cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts
bash install.sh
This will:
- Create conda environment
cosyvoice - Install PyTorch (CPU version for Apple Silicon)
- Install CosyVoice dependencies
- Download Fun-CosyVoice3-0.5B model (~2GB)
Usage
Quick Start - Basic TTS
重要:CosyVoice3 需要在参考文本中添加 \x3C|endofprompt|> 标记!
cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo
export PATH="$HOME/miniconda3/bin:$PATH"
conda activate cosyvoice
python -c "
import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import AutoModel
import torchaudio
cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B')
for i, j in enumerate(cosyvoice.inference_zero_shot(
'你好,这是CosyVoice3语音合成测试。',
'希望你以后能够做的比我还好呦。\x3C|endofprompt|>', # 注意这个标记!
'asset/zero_shot_prompt.wav'
)):
torchaudio.save('output.wav', j['tts_speech'], cosyvoice.sample_rate)
print('Generated: output.wav')
"
Using the TTS Script
Generate speech from text:
cd /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts
conda activate cosyvoice
# Basic TTS with default voice
python tts.py "你好,这是一个测试。"
# With custom reference audio for voice cloning
python tts.py "你好,这是克隆的声音。" --reference /path/to/reference.wav
# Cross-lingual (English text with Chinese voice)
python tts.py "Hello, this is cross-lingual synthesis." --reference asset/zero_shot_prompt.wav --lang en
# With speed control
python tts.py "这是一段快速的语音。" --speed 1.5
# Save to specific path
python tts.py "你好。" --output ~/Desktop/greeting.wav
Available Assets
Reference audio files in cosyvoice3-repo/asset/:
zero_shot_prompt.wav- Default Chinese female voicecross_lingual_prompt.wav- English prompt for cross-lingual
Advanced Features
Voice Cloning
Clone a voice from 3-10 seconds of reference audio:
from cosyvoice.cli.cosyvoice import AutoModel
import torchaudio
cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B')
# Clone voice and generate
for i, j in enumerate(cosyvoice.inference_zero_shot(
'这是克隆后的声音在说话。',
'Reference text transcription',
'/path/to/reference.wav'
)):
torchaudio.save('cloned.wav', j['tts_speech'], cosyvoice.sample_rate)
Fine-Grained Control
Control prosody with special tags:
# Add laughter
"他突然[laughter]笑了起来[laughter]。"
# Add breathing
"他说完这句话[breath],深吸一口气。"
# Strong emphasis
"这是\x3Cstrong>非常重要\x3C/strong>的。"
# Combined
"在面对挑战时,他展现了非凡的\x3Cstrong>勇气\x3C/strong>与\x3Cstrong>智慧\x3C/strong>[breath]。"
Dialect Support
Use instruct mode for dialects:
cosyvoice = AutoModel(model_dir='pretrained_models/CosyVoice-300M-Instruct')
for i, j in enumerate(cosyvoice.inference_instruct(
'你好,这是测试语音。',
'中文男',
'用四川话说这句话\x3C|endofprompt|>'
)):
torchaudio.save('sichuan.wav', j['tts_speech'], cosyvoice.sample_rate)
Troubleshooting
Model not found
If you get "model not found" errors, download models manually:
cd /Users/lhz/.openclaw/workspace/cosyvoice3-repo
export PATH="$HOME/miniconda3/bin:$PATH"
conda activate cosyvoice
python -c "
from modelscope import snapshot_download
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
"
Memory issues
For long text, split into sentences:
text = "很长的文本..."
sentences = text.split('。')
for sent in sentences:
if sent.strip():
# Process each sentence
Audio format
Reference audio requirements:
- Format: WAV, MP3
- Sample rate: 16kHz+ (automatically resampled)
- Duration: 3-10 seconds optimal
- Content: Clear speech, minimal background noise
Resources
Scripts
install.sh- Installation script for macOStts.py- Main TTS script with CLI interfacedownload_models.py- Download pretrained models
References
Model Files
Located in cosyvoice3-repo/pretrained_models/:
Fun-CosyVoice3-0.5B/- Main model (recommended)CosyVoice2-0.5B/- Previous versionCosyVoice-300M/- Lighter modelCosyVoice-300M-SFT/- SFT versionCosyVoice-300M-Instruct/- Instruct version
Notes
- First inference takes ~30 seconds (model warmup)
- Subsequent inferences are faster
- Apple Silicon uses CPU mode (no CUDA)
- RTF (real-time factor) ~0.3-0.5 on M-series chips
- Model files are cached locally after first download
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install cosyvoice3-macos - 安装完成后,直接呼叫该 Skill 的名称或使用
/cosyvoice3-macos触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
CosyVoice3 macOS 是什么?
Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon. Supports Chinese, English, Japanese, Korean, and 18+ Chinese dialects. Provides zero-... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 645 次。
如何安装 CosyVoice3 macOS?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install cosyvoice3-macos」即可一键安装,无需额外配置。
CosyVoice3 macOS 是免费的吗?
是的,CosyVoice3 macOS 完全免费(开源免费),可自由下载、安装和使用。
CosyVoice3 macOS 支持哪些平台?
CosyVoice3 macOS 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 CosyVoice3 macOS?
由 lhuaizhong(@lhuaizhong)开发并维护,当前版本 v1.0.0。