← 返回 Skills 市场
hanxueyuan

coze-voice-gen

作者 hanxueyuan · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ✓ 安全检测通过
311
总下载
0
收藏
3
当前安装
2
版本数
在 OpenClaw 中安装
/install coze-voice-gen
功能描述
Text-to-Speech (TTS) and Speech-to-Text (ASR) using coze-coding-dev-sdk. Returns results directly to stdout.
使用说明 (SKILL.md)

Coze Voice Generation

Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) using coze-coding-dev-sdk.

Text-to-Speech (TTS)

Single Audio

npx ts-node {baseDir}/scripts/tts.ts --text "Hello, welcome to our service!"

With Different Voice

npx ts-node {baseDir}/scripts/tts.ts \
  --text "This is a male voice" \
  --speaker zh_male_m191_uranus_bigtts

Batch Generation

npx ts-node {baseDir}/scripts/tts.ts \
  --texts "Chapter 1: Introduction" "Chapter 2: Getting Started" "Chapter 3: Advanced Topics" \
  --speaker zh_female_xueayi_saturn_bigtts

With Custom Parameters

npx ts-node {baseDir}/scripts/tts.ts \
  --text "Fast and loud announcement!" \
  --speech-rate 30 \
  --loudness-rate 20 \
  --format mp3 \
  --sample-rate 48000

TTS Options

Option Description
--text \x3Ctext> Single text to synthesize
--texts \x3Ctexts...> Multiple texts for batch generation
--speaker \x3Cid> Voice ID (default: zh_female_xiaohe_uranus_bigtts)
--format \x3Cfmt> mp3, pcm, ogg_opus (default: mp3)
--sample-rate \x3Chz> 8000-48000 (default: 24000)
--speech-rate \x3Cn> -50 to 100 (default: 0)
--loudness-rate \x3Cn> -50 to 100 (default: 0)

TTS Output

The script outputs audio URLs directly to stdout:

[1/1] Hello, welcome to our service!
  https://example.com/generated-audio.mp3

Available Voices

General Purpose:

  • zh_female_xiaohe_uranus_bigtts - Xiaohe (default)
  • zh_female_vv_uranus_bigtts - Vivi (Chinese & English)
  • zh_male_m191_uranus_bigtts - Yunzhou (male)
  • zh_male_taocheng_uranus_bigtts - Xiaotian (male)

Audiobook:

  • zh_female_xueayi_saturn_bigtts - Children's audiobook

Video Dubbing:

  • zh_male_dayi_saturn_bigtts - Dayi (male)
  • zh_female_mizai_saturn_bigtts - Mizai (female)
  • zh_female_jitangnv_saturn_bigtts - Motivational female

Role Playing:

  • saturn_zh_female_keainvsheng_tob - Cute girl
  • saturn_zh_male_shuanglangshaonian_tob - Cheerful boy

Speech-to-Text (ASR)

From URL

npx ts-node {baseDir}/scripts/asr.ts --url "https://example.com/audio.mp3"

From Local File

npx ts-node {baseDir}/scripts/asr.ts --file ./recording.mp3

ASR Options

Option Description
--url \x3Curl> Audio file URL
--file \x3Cpath> Local audio file path

ASR Output

Transcription is printed directly to stdout:

============================================================
TRANSCRIPTION
============================================================
Hello, this is the transcribed text from the audio file...
============================================================

Duration: 1m 30s
Segments: 5

ASR Requirements

  • Duration: ≤ 2 hours
  • File size: ≤ 100MB
  • Formats: WAV, MP3, OGG OPUS, M4A

Notes

  • Audio URLs have valid expiration - use directly when possible
  • Speech rate: negative = slower, positive = faster
  • Loudness rate: negative = quieter, positive = louder
安全使用建议
This skill appears to do exactly what it says: run local TypeScript scripts to send audio to Coze's SDK for TTS or ASR and print results. Before installing/use: 1) Confirm how the coze-coding-dev-sdk authenticates (API key, env vars, or config file) and where those credentials must be placed — the SKILL.md does not declare any required keys. 2) Ensure you have the necessary Node/ts-node and the coze SDK installed or understand how npx will resolve them. 3) Remember that uploading local audio or providing URLs will transmit data to Coze's service — do not send sensitive audio unless you trust Coze and your credential configuration. 4) If you need stronger assurance, inspect the SDK's Config implementation (or run the scripts in an isolated environment) to see whether it reads environment variables or local config files and what network endpoints it calls.
功能分析
Type: OpenClaw Skill Name: coze-voice-gen Version: 0.1.0 The skill provides Text-to-Speech (TTS) and Speech-to-Text (ASR) functionality using the coze-coding-dev-sdk. The scripts (scripts/tts.ts and scripts/asr.ts) are straightforward wrappers for the SDK, allowing the agent to generate audio from text or transcribe local/remote audio files. No evidence of malicious intent, data exfiltration, or prompt injection was found; the file-reading capability in the ASR script is consistent with its stated purpose.
能力评估
Purpose & Capability
Name/description, SKILL.md examples, and included scripts (tts.ts, asr.ts) all implement TTS and ASR via the coze-coding-dev-sdk and rely on npx/ts-node to run. There are no unrelated binaries, credentials, or config paths requested.
Instruction Scope
The runtime instructions and scripts stay within expected scope: reading a local audio file (when requested), accepting a URL, base64-encoding local audio, and calling the SDK. The scripts print transcriptions or audio URIs to stdout. They transmit audio data to the coze SDK (i.e., to Coze's service) — which is expected for this functionality but important to be aware of.
Install Mechanism
There is no install spec. SKILL.md instructs using 'npx ts-node' to run the scripts; that will provide ts-node but the repository doesn't include package.json or explicit installation of the coze-coding-dev-sdk. Users will need to ensure dependencies (coze-coding-dev-sdk and any TS runtime) are available. No downloads from suspicious URLs or archived extracts are present.
Credentials
The skill declares no required environment variables, and the scripts do not directly read env vars. However, both scripts instantiate a Config() from coze-coding-dev-sdk — that SDK may require API keys or config (via environment variables, config files, or other host credentials). The lack of declared required credentials/primaryEnv is a transparency gap users should verify against the SDK's docs.
Persistence & Privilege
The skill does not request always:true or any elevated persistence. It does not attempt to modify other skills or system-wide settings; it only runs as-invoked.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install coze-voice-gen
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /coze-voice-gen 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial release. - Provides Text-to-Speech (TTS) and Speech-to-Text (ASR) features using coze-coding-dev-sdk. - Supports single and batch text synthesis with customizable voices and audio parameters. - Returns audio URLs or transcription results directly to stdout. - Allows both URL and local file input for ASR. - Includes multiple built-in voice options for various use cases.
v1.0.0
Initial release with text-to-speech (TTS) and speech-to-text (ASR) capabilities: - Convert text to audio using various voices and output audio URLs to stdout. - Batch TTS generation and extensive customization (voice, format, rate, loudness). - Convert audio (from URL or local file) to transcribed text, printed directly to stdout. - Supports multiple audio formats and includes detailed requirements. - Simple CLI usage via npx with clear examples and option tables.
元数据
Slug coze-voice-gen
版本 0.1.0
许可证 MIT-0
累计安装 4
当前安装数 3
历史版本数 2
常见问题

coze-voice-gen 是什么?

Text-to-Speech (TTS) and Speech-to-Text (ASR) using coze-coding-dev-sdk. Returns results directly to stdout. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 311 次。

如何安装 coze-voice-gen?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install coze-voice-gen」即可一键安装,无需额外配置。

coze-voice-gen 是免费的吗?

是的,coze-voice-gen 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

coze-voice-gen 支持哪些平台?

coze-voice-gen 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 coze-voice-gen?

由 hanxueyuan(@hanxueyuan)开发并维护,当前版本 v0.1.0。

💬 留言讨论