← 返回 Skills 市场
openlark

VoxCPM2 — Tokenizer-Free Multilingual TTS

作者 OpenLark · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
18
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install voxcpm
功能描述
VoxCPM2 — Tokenizer-Free TTS model guide. Covers installation, Python/CLI API (TTS/Voice Design/Controllable Cloning/Ultimate Cloning/Streaming), vLLM-Omni d...
使用说明 (SKILL.md)

VoxCPM2 — Tokenizer-Free Multilingual TTS

A tokenizer-free TTS from OpenBMB based on a diffusion autoregressive architecture. 2B parameters, trained on 2M+ hours, 30 languages, 48kHz output, built on MiniCPM-4.

Architecture: LocEnc → TSLM → RALM → LocDiT, AudioVAE V2 asymmetric 16kHz→48kHz.

Installation

pip install voxcpm  # Python ≥3.10, PyTorch ≥2.5, CUDA ≥12
model = VoxCPM.from_pretrained("openbmb/VoxCPM2", device="auto")  # cuda→mps→cpu
# torch.compile issues: optimize=False; HF mirror: export HF_ENDPOINT=https://hf-mirror.com

Models

V2 (2B) 1.5 (0.8B) 0.5B
Sample Rate 48kHz 44.1kHz 16kHz
Languages 30 2(zh/en) 2(zh/en)
Voice Design
VRAM/RTF ~8GB/~0.30 ~6GB/~0.15 ~5GB/~0.17

30 languages: Chinese, English, Japanese, Korean, French, German, Spanish, Italian, Russian, Arabic, Hindi, Thai, Vietnamese, Turkish, Dutch, Finnish, Norwegian, Swedish, Danish, Polish, Portuguese, Greek, Hebrew, Indonesian, Malay, Burmese, Khmer, Lao, Swahili, Tagalog + 9 Chinese dialects (Sichuan, Cantonese, Wu, Northeastern, Henan, Shaanxi, Shandong, Tianjin, Minnan)

Python API

from voxcpm import VoxCPM; import soundfile as sf
model = VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)

# Basic TTS
sf.write("out.wav", model.generate("Hello!", cfg_value=2.0, inference_timesteps=10), model.tts_model.sample_rate)

# Voice Design (text description → voice, no reference audio needed)
wav = model.generate("(A young woman, gentle voice)Hello!")

# Controllable Cloning (reference audio + style control)
wav = model.generate("Hello.", reference_wav_path="voice.wav")
wav = model.generate("(faster, cheerful)Hi.", reference_wav_path="voice.wav")

# Ultimate Cloning (reference audio + transcript for full detail reproduction)
wav = model.generate("Text.", prompt_wav_path="ref.wav", prompt_text="transcript", reference_wav_path="ref.wav")

# Streaming
import numpy as np
wav = np.concatenate([c for c in model.generate_streaming("Streaming!")])

generate() params: text(required) reference_wav_path prompt_wav_path prompt_text cfg_value=2.0(1-3) inference_timesteps=10(4-30) normalize=False denoise=False retry_badcase=True

CLI

voxcpm design --text "Hello" --control "Young female warm voice" --output out.wav --device auto
voxcpm clone --text "Hi" --reference-audio voice.wav --prompt-audio ref.wav --prompt-text "txt" --output out.wav
voxcpm batch --input examples/input.txt --output-dir outs

Web Demo

git clone https://github.com/OpenBMB/VoxCPM.git && cd VoxCPM && pip install -e .
python app.py --port 8808 --device auto

Deployment

vLLM-Omni (recommended, OpenAI-compatible)

uv pip install vllm==0.19.0 --torch-backend=auto
git clone https://github.com/vllm-project/vllm-omni.git && cd vllm-omni && uv pip install -e .
vllm serve openbmb/VoxCPM2 --omni --port 8000
curl http://localhost:8000/v1/audio/speech -H "Content-Type:application/json" -d '{"model":"openbmb/VoxCPM2","input":"Hello!","voice":"default"}' --output out.wav

Nano-vLLM: pip install nano-vllm-voxcpm (RTF ~0.13 vs standard ~0.30)

Fine-tuning

# LoRA (recommended)
python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v2/voxcpm_finetune_lora.yaml
# Full fine-tuning
python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v2/voxcpm_finetune_all.yaml
# WebUI
python lora_ft_webui.py  # http://localhost:7860

Data format JSONL: {"audio":"path","text":"transcript","ref_audio":"path"} (recommend 30-50% samples with ref_audio). LoRA params r=32 alpha=16, hot-swappable (load_lora/unload_lora/set_lora_enabled). Adapt to a speaker with as little as 5-10 minutes of audio.

License

Apache 2.0 — free for commercial use

安全使用建议
Install only if you will use it with your own voice or voices you are explicitly authorized to reproduce. Do not use it to impersonate people, create deceptive audio, or clone a speaker without documented consent; handle any uploaded voice samples as sensitive personal data.
能力评估
Purpose & Capability
Voice cloning, controllable cloning, and speaker adaptation are coherent with a TTS skill, but they are high-impact identity-replication capabilities and need explicit consent and non-impersonation limits.
Instruction Scope
The documented runtime guidance appears to enable cloning-style use without clearly scoping it to the user's own voice or authorized speakers.
Install Mechanism
No malicious install behavior was evidenced in the supplied scan context; the concern is the capability and under-disclosed safety posture rather than installation.
Credentials
A TTS or voice-cloning workflow may reasonably need audio inputs, model/provider access, and generated audio outputs, but users should treat source voice samples as sensitive biometric data.
Persistence & Privilege
No artifact-backed evidence of hidden persistence, privilege escalation, destructive behavior, or exfiltration was provided.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install voxcpm
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /voxcpm 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of voxcpm: a tokenizer-free, multilingual TTS model guide based on VoxCPM2. - Features installation steps, Python and CLI usage for TTS, voice design, voice cloning, and streaming. - Includes instructions for vLLM-Omni OpenAI-compatible deployment and fine-tuning (SFT/LoRA). - Supports 30 languages, high-quality 48kHz output, and advanced voice control features. - Provides model comparisons, sample commands, and Web demo setup. - Licensed under Apache 2.0 for free commercial use.
元数据
Slug voxcpm
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

VoxCPM2 — Tokenizer-Free Multilingual TTS 是什么?

VoxCPM2 — Tokenizer-Free TTS model guide. Covers installation, Python/CLI API (TTS/Voice Design/Controllable Cloning/Ultimate Cloning/Streaming), vLLM-Omni d... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 18 次。

如何安装 VoxCPM2 — Tokenizer-Free Multilingual TTS?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install voxcpm」即可一键安装,无需额外配置。

VoxCPM2 — Tokenizer-Free Multilingual TTS 是免费的吗?

是的,VoxCPM2 — Tokenizer-Free Multilingual TTS 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

VoxCPM2 — Tokenizer-Free Multilingual TTS 支持哪些平台?

VoxCPM2 — Tokenizer-Free Multilingual TTS 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 VoxCPM2 — Tokenizer-Free Multilingual TTS?

由 OpenLark(@openlark)开发并维护,当前版本 v1.0.0。

💬 留言讨论