← Back to Skills Marketplace
hanxueyuan

coze-voice-gen

by hanxueyuan · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ✓ Security Clean
311
Downloads
0
Stars
3
Active Installs
2
Versions
Install in OpenClaw
/install coze-voice-gen
Description
Text-to-Speech (TTS) and Speech-to-Text (ASR) using coze-coding-dev-sdk. Returns results directly to stdout.
README (SKILL.md)

Coze Voice Generation

Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) using coze-coding-dev-sdk.

Text-to-Speech (TTS)

Single Audio

npx ts-node {baseDir}/scripts/tts.ts --text "Hello, welcome to our service!"

With Different Voice

npx ts-node {baseDir}/scripts/tts.ts \
  --text "This is a male voice" \
  --speaker zh_male_m191_uranus_bigtts

Batch Generation

npx ts-node {baseDir}/scripts/tts.ts \
  --texts "Chapter 1: Introduction" "Chapter 2: Getting Started" "Chapter 3: Advanced Topics" \
  --speaker zh_female_xueayi_saturn_bigtts

With Custom Parameters

npx ts-node {baseDir}/scripts/tts.ts \
  --text "Fast and loud announcement!" \
  --speech-rate 30 \
  --loudness-rate 20 \
  --format mp3 \
  --sample-rate 48000

TTS Options

Option Description
--text \x3Ctext> Single text to synthesize
--texts \x3Ctexts...> Multiple texts for batch generation
--speaker \x3Cid> Voice ID (default: zh_female_xiaohe_uranus_bigtts)
--format \x3Cfmt> mp3, pcm, ogg_opus (default: mp3)
--sample-rate \x3Chz> 8000-48000 (default: 24000)
--speech-rate \x3Cn> -50 to 100 (default: 0)
--loudness-rate \x3Cn> -50 to 100 (default: 0)

TTS Output

The script outputs audio URLs directly to stdout:

[1/1] Hello, welcome to our service!
  https://example.com/generated-audio.mp3

Available Voices

General Purpose:

  • zh_female_xiaohe_uranus_bigtts - Xiaohe (default)
  • zh_female_vv_uranus_bigtts - Vivi (Chinese & English)
  • zh_male_m191_uranus_bigtts - Yunzhou (male)
  • zh_male_taocheng_uranus_bigtts - Xiaotian (male)

Audiobook:

  • zh_female_xueayi_saturn_bigtts - Children's audiobook

Video Dubbing:

  • zh_male_dayi_saturn_bigtts - Dayi (male)
  • zh_female_mizai_saturn_bigtts - Mizai (female)
  • zh_female_jitangnv_saturn_bigtts - Motivational female

Role Playing:

  • saturn_zh_female_keainvsheng_tob - Cute girl
  • saturn_zh_male_shuanglangshaonian_tob - Cheerful boy

Speech-to-Text (ASR)

From URL

npx ts-node {baseDir}/scripts/asr.ts --url "https://example.com/audio.mp3"

From Local File

npx ts-node {baseDir}/scripts/asr.ts --file ./recording.mp3

ASR Options

Option Description
--url \x3Curl> Audio file URL
--file \x3Cpath> Local audio file path

ASR Output

Transcription is printed directly to stdout:

============================================================
TRANSCRIPTION
============================================================
Hello, this is the transcribed text from the audio file...
============================================================

Duration: 1m 30s
Segments: 5

ASR Requirements

  • Duration: ≤ 2 hours
  • File size: ≤ 100MB
  • Formats: WAV, MP3, OGG OPUS, M4A

Notes

  • Audio URLs have valid expiration - use directly when possible
  • Speech rate: negative = slower, positive = faster
  • Loudness rate: negative = quieter, positive = louder
Usage Guidance
This skill appears to do exactly what it says: run local TypeScript scripts to send audio to Coze's SDK for TTS or ASR and print results. Before installing/use: 1) Confirm how the coze-coding-dev-sdk authenticates (API key, env vars, or config file) and where those credentials must be placed — the SKILL.md does not declare any required keys. 2) Ensure you have the necessary Node/ts-node and the coze SDK installed or understand how npx will resolve them. 3) Remember that uploading local audio or providing URLs will transmit data to Coze's service — do not send sensitive audio unless you trust Coze and your credential configuration. 4) If you need stronger assurance, inspect the SDK's Config implementation (or run the scripts in an isolated environment) to see whether it reads environment variables or local config files and what network endpoints it calls.
Capability Analysis
Type: OpenClaw Skill Name: coze-voice-gen Version: 0.1.0 The skill provides Text-to-Speech (TTS) and Speech-to-Text (ASR) functionality using the coze-coding-dev-sdk. The scripts (scripts/tts.ts and scripts/asr.ts) are straightforward wrappers for the SDK, allowing the agent to generate audio from text or transcribe local/remote audio files. No evidence of malicious intent, data exfiltration, or prompt injection was found; the file-reading capability in the ASR script is consistent with its stated purpose.
Capability Assessment
Purpose & Capability
Name/description, SKILL.md examples, and included scripts (tts.ts, asr.ts) all implement TTS and ASR via the coze-coding-dev-sdk and rely on npx/ts-node to run. There are no unrelated binaries, credentials, or config paths requested.
Instruction Scope
The runtime instructions and scripts stay within expected scope: reading a local audio file (when requested), accepting a URL, base64-encoding local audio, and calling the SDK. The scripts print transcriptions or audio URIs to stdout. They transmit audio data to the coze SDK (i.e., to Coze's service) — which is expected for this functionality but important to be aware of.
Install Mechanism
There is no install spec. SKILL.md instructs using 'npx ts-node' to run the scripts; that will provide ts-node but the repository doesn't include package.json or explicit installation of the coze-coding-dev-sdk. Users will need to ensure dependencies (coze-coding-dev-sdk and any TS runtime) are available. No downloads from suspicious URLs or archived extracts are present.
Credentials
The skill declares no required environment variables, and the scripts do not directly read env vars. However, both scripts instantiate a Config() from coze-coding-dev-sdk — that SDK may require API keys or config (via environment variables, config files, or other host credentials). The lack of declared required credentials/primaryEnv is a transparency gap users should verify against the SDK's docs.
Persistence & Privilege
The skill does not request always:true or any elevated persistence. It does not attempt to modify other skills or system-wide settings; it only runs as-invoked.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install coze-voice-gen
  3. After installation, invoke the skill by name or use /coze-voice-gen
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Initial release. - Provides Text-to-Speech (TTS) and Speech-to-Text (ASR) features using coze-coding-dev-sdk. - Supports single and batch text synthesis with customizable voices and audio parameters. - Returns audio URLs or transcription results directly to stdout. - Allows both URL and local file input for ASR. - Includes multiple built-in voice options for various use cases.
v1.0.0
Initial release with text-to-speech (TTS) and speech-to-text (ASR) capabilities: - Convert text to audio using various voices and output audio URLs to stdout. - Batch TTS generation and extensive customization (voice, format, rate, loudness). - Convert audio (from URL or local file) to transcribed text, printed directly to stdout. - Supports multiple audio formats and includes detailed requirements. - Simple CLI usage via npx with clear examples and option tables.
Metadata
Slug coze-voice-gen
Version 0.1.0
License MIT-0
All-time Installs 4
Active Installs 3
Total Versions 2
Frequently Asked Questions

What is coze-voice-gen?

Text-to-Speech (TTS) and Speech-to-Text (ASR) using coze-coding-dev-sdk. Returns results directly to stdout. It is an AI Agent Skill for Claude Code / OpenClaw, with 311 downloads so far.

How do I install coze-voice-gen?

Run "/install coze-voice-gen" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is coze-voice-gen free?

Yes, coze-voice-gen is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does coze-voice-gen support?

coze-voice-gen is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created coze-voice-gen?

It is built and maintained by hanxueyuan (@hanxueyuan); the current version is v0.1.0.

💬 Comments