coze-voice-gen
/install coze-voice-gen
Coze Voice Generation
Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) using coze-coding-dev-sdk.
Text-to-Speech (TTS)
Single Audio
npx ts-node {baseDir}/scripts/tts.ts --text "Hello, welcome to our service!"
With Different Voice
npx ts-node {baseDir}/scripts/tts.ts \
--text "This is a male voice" \
--speaker zh_male_m191_uranus_bigtts
Batch Generation
npx ts-node {baseDir}/scripts/tts.ts \
--texts "Chapter 1: Introduction" "Chapter 2: Getting Started" "Chapter 3: Advanced Topics" \
--speaker zh_female_xueayi_saturn_bigtts
With Custom Parameters
npx ts-node {baseDir}/scripts/tts.ts \
--text "Fast and loud announcement!" \
--speech-rate 30 \
--loudness-rate 20 \
--format mp3 \
--sample-rate 48000
TTS Options
| Option | Description |
|---|---|
--text \x3Ctext> |
Single text to synthesize |
--texts \x3Ctexts...> |
Multiple texts for batch generation |
--speaker \x3Cid> |
Voice ID (default: zh_female_xiaohe_uranus_bigtts) |
--format \x3Cfmt> |
mp3, pcm, ogg_opus (default: mp3) |
--sample-rate \x3Chz> |
8000-48000 (default: 24000) |
--speech-rate \x3Cn> |
-50 to 100 (default: 0) |
--loudness-rate \x3Cn> |
-50 to 100 (default: 0) |
TTS Output
The script outputs audio URLs directly to stdout:
[1/1] Hello, welcome to our service!
https://example.com/generated-audio.mp3
Available Voices
General Purpose:
zh_female_xiaohe_uranus_bigtts- Xiaohe (default)zh_female_vv_uranus_bigtts- Vivi (Chinese & English)zh_male_m191_uranus_bigtts- Yunzhou (male)zh_male_taocheng_uranus_bigtts- Xiaotian (male)
Audiobook:
zh_female_xueayi_saturn_bigtts- Children's audiobook
Video Dubbing:
zh_male_dayi_saturn_bigtts- Dayi (male)zh_female_mizai_saturn_bigtts- Mizai (female)zh_female_jitangnv_saturn_bigtts- Motivational female
Role Playing:
saturn_zh_female_keainvsheng_tob- Cute girlsaturn_zh_male_shuanglangshaonian_tob- Cheerful boy
Speech-to-Text (ASR)
From URL
npx ts-node {baseDir}/scripts/asr.ts --url "https://example.com/audio.mp3"
From Local File
npx ts-node {baseDir}/scripts/asr.ts --file ./recording.mp3
ASR Options
| Option | Description |
|---|---|
--url \x3Curl> |
Audio file URL |
--file \x3Cpath> |
Local audio file path |
ASR Output
Transcription is printed directly to stdout:
============================================================
TRANSCRIPTION
============================================================
Hello, this is the transcribed text from the audio file...
============================================================
Duration: 1m 30s
Segments: 5
ASR Requirements
- Duration: ≤ 2 hours
- File size: ≤ 100MB
- Formats: WAV, MP3, OGG OPUS, M4A
Notes
- Audio URLs have valid expiration - use directly when possible
- Speech rate: negative = slower, positive = faster
- Loudness rate: negative = quieter, positive = louder
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install coze-voice-gen - After installation, invoke the skill by name or use
/coze-voice-gen - Provide required inputs per the skill's parameter spec and get structured output
What is coze-voice-gen?
Text-to-Speech (TTS) and Speech-to-Text (ASR) using coze-coding-dev-sdk. Returns results directly to stdout. It is an AI Agent Skill for Claude Code / OpenClaw, with 311 downloads so far.
How do I install coze-voice-gen?
Run "/install coze-voice-gen" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is coze-voice-gen free?
Yes, coze-voice-gen is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does coze-voice-gen support?
coze-voice-gen is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created coze-voice-gen?
It is built and maintained by hanxueyuan (@hanxueyuan); the current version is v0.1.0.