← Back to Skills Marketplace

VoxCPM2 — Tokenizer-Free Multilingual TTS

Name: VoxCPM2 — Tokenizer-Free Multilingual TTS
Author: openlark

by OpenLark · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install voxcpm

Description

VoxCPM2 — Tokenizer-Free TTS model guide. Covers installation, Python/CLI API (TTS/Voice Design/Controllable Cloning/Ultimate Cloning/Streaming), vLLM-Omni d...

README (SKILL.md)

VoxCPM2 — Tokenizer-Free Multilingual TTS

A tokenizer-free TTS from OpenBMB based on a diffusion autoregressive architecture. 2B parameters, trained on 2M+ hours, 30 languages, 48kHz output, built on MiniCPM-4.

Architecture: LocEnc → TSLM → RALM → LocDiT, AudioVAE V2 asymmetric 16kHz→48kHz.

Installation

pip install voxcpm  # Python ≥3.10, PyTorch ≥2.5, CUDA ≥12
model = VoxCPM.from_pretrained("openbmb/VoxCPM2", device="auto")  # cuda→mps→cpu
# torch.compile issues: optimize=False; HF mirror: export HF_ENDPOINT=https://hf-mirror.com

Models

	V2 (2B)	1.5 (0.8B)	0.5B
Sample Rate	48kHz	44.1kHz	16kHz
Languages	30	2(zh/en)	2(zh/en)
Voice Design	✅	—	—
VRAM/RTF	~8GB/~0.30	~6GB/~0.15	~5GB/~0.17

30 languages: Chinese, English, Japanese, Korean, French, German, Spanish, Italian, Russian, Arabic, Hindi, Thai, Vietnamese, Turkish, Dutch, Finnish, Norwegian, Swedish, Danish, Polish, Portuguese, Greek, Hebrew, Indonesian, Malay, Burmese, Khmer, Lao, Swahili, Tagalog + 9 Chinese dialects (Sichuan, Cantonese, Wu, Northeastern, Henan, Shaanxi, Shandong, Tianjin, Minnan)

Python API

from voxcpm import VoxCPM; import soundfile as sf
model = VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)

# Basic TTS
sf.write("out.wav", model.generate("Hello!", cfg_value=2.0, inference_timesteps=10), model.tts_model.sample_rate)

# Voice Design (text description → voice, no reference audio needed)
wav = model.generate("(A young woman, gentle voice)Hello!")

# Controllable Cloning (reference audio + style control)
wav = model.generate("Hello.", reference_wav_path="voice.wav")
wav = model.generate("(faster, cheerful)Hi.", reference_wav_path="voice.wav")

# Ultimate Cloning (reference audio + transcript for full detail reproduction)
wav = model.generate("Text.", prompt_wav_path="ref.wav", prompt_text="transcript", reference_wav_path="ref.wav")

# Streaming
import numpy as np
wav = np.concatenate([c for c in model.generate_streaming("Streaming!")])

generate() params: text(required) reference_wav_path prompt_wav_path prompt_text cfg_value=2.0(1-3) inference_timesteps=10(4-30) normalize=False denoise=False retry_badcase=True

CLI

voxcpm design --text "Hello" --control "Young female warm voice" --output out.wav --device auto
voxcpm clone --text "Hi" --reference-audio voice.wav --prompt-audio ref.wav --prompt-text "txt" --output out.wav
voxcpm batch --input examples/input.txt --output-dir outs

Web Demo

git clone https://github.com/OpenBMB/VoxCPM.git && cd VoxCPM && pip install -e .
python app.py --port 8808 --device auto

Deployment

vLLM-Omni (recommended, OpenAI-compatible)

uv pip install vllm==0.19.0 --torch-backend=auto
git clone https://github.com/vllm-project/vllm-omni.git && cd vllm-omni && uv pip install -e .
vllm serve openbmb/VoxCPM2 --omni --port 8000
curl http://localhost:8000/v1/audio/speech -H "Content-Type:application/json" -d '{"model":"openbmb/VoxCPM2","input":"Hello!","voice":"default"}' --output out.wav

Nano-vLLM: pip install nano-vllm-voxcpm (RTF ~0.13 vs standard ~0.30)

Fine-tuning

# LoRA (recommended)
python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v2/voxcpm_finetune_lora.yaml
# Full fine-tuning
python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v2/voxcpm_finetune_all.yaml
# WebUI
python lora_ft_webui.py  # http://localhost:7860

Data format JSONL: {"audio":"path","text":"transcript","ref_audio":"path"} (recommend 30-50% samples with ref_audio). LoRA params r=32 alpha=16, hot-swappable (load_lora/unload_lora/set_lora_enabled). Adapt to a speaker with as little as 5-10 minutes of audio.

License

Apache 2.0 — free for commercial use

Usage Guidance

Install only if you will use it with your own voice or voices you are explicitly authorized to reproduce. Do not use it to impersonate people, create deceptive audio, or clone a speaker without documented consent; handle any uploaded voice samples as sensitive personal data.

Capability Assessment

⚠ Purpose & Capability

Voice cloning, controllable cloning, and speaker adaptation are coherent with a TTS skill, but they are high-impact identity-replication capabilities and need explicit consent and non-impersonation limits.

⚠ Instruction Scope

The documented runtime guidance appears to enable cloning-style use without clearly scoping it to the user's own voice or authorized speakers.

ℹ Install Mechanism

No malicious install behavior was evidenced in the supplied scan context; the concern is the capability and under-disclosed safety posture rather than installation.

ℹ Credentials

A TTS or voice-cloning workflow may reasonably need audio inputs, model/provider access, and generated audio outputs, but users should treat source voice samples as sensitive biometric data.

ℹ Persistence & Privilege

No artifact-backed evidence of hidden persistence, privilege escalation, destructive behavior, or exfiltration was provided.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install voxcpm
After installation, invoke the skill by name or use /voxcpm
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of voxcpm: a tokenizer-free, multilingual TTS model guide based on VoxCPM2. - Features installation steps, Python and CLI usage for TTS, voice design, voice cloning, and streaming. - Includes instructions for vLLM-Omni OpenAI-compatible deployment and fine-tuning (SFT/LoRA). - Supports 30 languages, high-quality 48kHz output, and advanced voice control features. - Provides model comparisons, sample commands, and Web demo setup. - Licensed under Apache 2.0 for free commercial use.

Metadata

Slug voxcpm

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is VoxCPM2 — Tokenizer-Free Multilingual TTS?

VoxCPM2 — Tokenizer-Free TTS model guide. Covers installation, Python/CLI API (TTS/Voice Design/Controllable Cloning/Ultimate Cloning/Streaming), vLLM-Omni d... It is an AI Agent Skill for Claude Code / OpenClaw, with 18 downloads so far.

How do I install VoxCPM2 — Tokenizer-Free Multilingual TTS?

Run "/install voxcpm" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is VoxCPM2 — Tokenizer-Free Multilingual TTS free?

Yes, VoxCPM2 — Tokenizer-Free Multilingual TTS is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does VoxCPM2 — Tokenizer-Free Multilingual TTS support?

VoxCPM2 — Tokenizer-Free Multilingual TTS is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created VoxCPM2 — Tokenizer-Free Multilingual TTS?

It is built and maintained by OpenLark (@openlark); the current version is v1.0.0.

More Skills

VoxCPM2 — Tokenizer-Free Multilingual TTS

VoxCPM2 — Tokenizer-Free Multilingual TTS

Installation

Models

Python API

CLI

Web Demo

Deployment

vLLM-Omni (recommended, OpenAI-compatible)

Fine-tuning

License

What is VoxCPM2 — Tokenizer-Free Multilingual TTS?

How do I install VoxCPM2 — Tokenizer-Free Multilingual TTS?

Is VoxCPM2 — Tokenizer-Free Multilingual TTS free?

Which platforms does VoxCPM2 — Tokenizer-Free Multilingual TTS support?

Who created VoxCPM2 — Tokenizer-Free Multilingual TTS?

💬 Comments