Description

国内可用的文本转语音技能，基于硅基流动（SiliconFlow）API。Use when the user wants to convert text to speech in China without VPN. Supports CosyVoice2-0.5B (multilingual, emotion c...

README (SKILL.md)

国内文本转语音 China TTS

Name: China Tts
Author: tobewin

基于硅基流动（SiliconFlow）API，国内直连，无需翻墙。支持中英日韩及粤语、四川话等方言，支持情感控制和声音克隆。

音色完整列表 → references/voices.md 使用场景与示例 → references/examples.md

触发时机

"把这段文字转成语音"
"用温柔女声朗读这段内容"
"生成一个播客对话音频：[S1]... [S2]..."
"用粤语朗读这段话"
"帮我克隆这个声音来朗读"

前置配置（首次使用）

1. 访问 cloud.siliconflow.cn，手机号注册（国内直连）
2. 进入「API密钥」页面，创建并复制 API Key
3. 在 OpenClaw 中配置：
   export SILICONFLOW_API_KEY="sk-xxxxxxxxxxxxxxxx"
   或写入 ~/.openclaw/.env

注意：使用自定义音色（声音克隆）需要完成实名认证

模型选择

日常朗读 / 博客配音 / 多语言    → CosyVoice2-0.5B（推荐首选）
播客对话 / 双人角色扮演          → MOSS-TTSD-v0.5

CosyVoice2-0.5B（推荐）

模型名：FunAudioLLM/CosyVoice2-0.5B
特点：
  - 支持中文、英文、日语、韩语
  - 支持中国方言：粤语、四川话、上海话、郑州话、长沙话、天津话
  - 支持情感控制：快乐、兴奋、悲伤、愤怒等
  - 8种内置音色，支持自定义声音克隆

MOSS-TTSD-v0.5（双人对话专用）

模型名：fnlp/MOSS-TTSD-v0.5
特点：
  - 专为对话场景设计，支持双人声音
  - 使用 [S1] [S2] 标签区分说话人
  - 支持声音克隆（通过 references 字段传入两个音色）
  - 适合 AI 播客、角色扮演、对话配音
  - 最大 128000 字符输入

API 调用

基础朗读（CosyVoice2，系统预置音色）

curl --location 'https://api.siliconflow.cn/v1/audio/speech' \
  --header "Authorization: Bearer $SILICONFLOW_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "FunAudioLLM/CosyVoice2-0.5B",
    "input": "你好，欢迎使用硅基流动语音合成服务。",
    "voice": "FunAudioLLM/CosyVoice2-0.5B:claire",
    "response_format": "mp3",
    "speed": 1.0,
    "gain": 0
  }' \
  --output output.mp3

情感控制朗读

# 在 input 开头加上情感指令，用 \x3C|endofprompt|> 分隔
--data '{
  "model": "FunAudioLLM/CosyVoice2-0.5B",
  "input": "你能用高兴的情感说吗？\x3C|endofprompt|>今天真是太开心了，马上要放假了！",
  "voice": "FunAudioLLM/CosyVoice2-0.5B:diana",
  "response_format": "mp3"
}'

情感指令示例：

"你能用高兴的情感说吗？\x3C|endofprompt|>内容..."
"请用悲伤的语气朗读：\x3C|endofprompt|>内容..."
"用激动兴奋的语调：\x3C|endofprompt|>内容..."
"请用平静舒缓的方式：\x3C|endofprompt|>内容..."

方言朗读

# 在 input 中自然指定方言，CosyVoice2 会识别
--data '{
  "model": "FunAudioLLM/CosyVoice2-0.5B",
  "input": "请用粤语朗读：\x3C|endofprompt|>多保重，早休息。",
  "voice": "FunAudioLLM/CosyVoice2-0.5B:anna",
  "response_format": "mp3"
}'

支持方言：粤语、四川话、上海话、郑州话、长沙话、天津话

双人对话（MOSS-TTSD，播客场景）

curl --location 'https://api.siliconflow.cn/v1/audio/speech' \
  --header "Authorization: Bearer $SILICONFLOW_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "fnlp/MOSS-TTSD-v0.5",
    "input": "[S1]大家好，欢迎收听今天的节目。[S2]今天我们来聊一聊人工智能的发展。[S1]是的，最近 AI 的进步真的很惊人。",
    "voice": "fnlp/MOSS-TTSD-v0.5:alex",
    "response_format": "mp3",
    "speed": 1.0,
    "gain": 0,
    "max_tokens": 2048
  }' \
  --output podcast.mp3

⚠️ MOSS-TTSD 对话格式规则：

[S1] 标签 = 说话人1
[S2] 标签 = 说话人2
两个标签必须都出现，且交替使用
单人文本请用 CosyVoice2，不要用 MOSS-TTSD

使用自定义克隆音色（需实名认证）

# 先上传参考音频（一次性操作，30秒以内的清晰录音）
curl --location 'https://api.siliconflow.cn/v1/uploads/audio/voice' \
  --header "Authorization: Bearer $SILICONFLOW_API_KEY" \
  --form 'model="FunAudioLLM/CosyVoice2-0.5B"' \
  --form 'customName="my-voice"' \
  --form 'text="在一无所知中，梦里的一天结束了，一个新的轮回便会开始"' \
  --form 'file=@/path/to/reference.mp3'

# 返回 uri 字段，格式：speech:my-voice:xxxxx:xxxxx
# 将 uri 作为 voice 参数使用
curl --location 'https://api.siliconflow.cn/v1/audio/speech' \
  --header "Authorization: Bearer $SILICONFLOW_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "FunAudioLLM/CosyVoice2-0.5B",
    "input": "你好，这是我的克隆声音。",
    "voice": "speech:my-voice:xxxxx:xxxxx",
    "response_format": "mp3"
  }' \
  --output cloned.mp3

参数说明

model（必填）：
  FunAudioLLM/CosyVoice2-0.5B   日常首选
  fnlp/MOSS-TTSD-v0.5            双人对话

input（必填）：
  待转换的文字，最长128000字符
  ⚠️ 不要在文字前后加多余空格
  CosyVoice2 情感控制格式：
    "情感指令\x3C|endofprompt|>正文内容"
  MOSS-TTSD 对话格式：
    "[S1]说话人1的内容[S2]说话人2的内容"

voice（必填）：
  系统预置：FunAudioLLM/CosyVoice2-0.5B:alex 等
  自定义克隆：speech:name:xxxxx:xxxxx
  详细音色列表见 references/voices.md

response_format（可选，默认 mp3）：
  mp3    通用，默认推荐
  wav    无损，文件较大
  opus   高压缩，适合流媒体
  pcm    原始数据，需自行处理

sample_rate（可选）：
  mp3：支持 32000、44100（默认44100）
  wav/pcm：支持 8000、16000、24000、32000、44100（默认44100）
  opus：仅支持 48000

speed（可选，默认 1.0）：
  范围：0.25 ~ 4.0
  0.75 = 慢速，1.0 = 正常，1.5 = 快速

gain（可选，默认 0）：
  范围：-10 ~ 10（单位 dB）
  正值增大音量，负值减小音量

max_tokens（可选，仅 MOSS-TTSD）：
  默认 2048，最大 4096
  input + output 总计不超过 32k tokens

stream（可选，默认 true）：
  true = 流式输出（边生成边返回）
  false = 等待完整生成后返回

计费说明

计费方式：按输入文本的 UTF-8 字节数计费
  英文字母 = 1字节/字符
  中文汉字 = 3字节/字符
  
实际费用极低，新用户免费额度通常可生成数小时音频
充值方式：支付宝 / 微信，最低充值10元

文件保存路径

生成的音频文件保存到当前 OpenClaw 工作区（或当前子 Agent 工作区），使用时间戳命名，保留所有历史音频。

路径约定

# 优先使用：当前工作区目录（$OPENCLAW_WORKSPACE 或 $PWD）
OUTPUT_DIR="${OPENCLAW_WORKSPACE:-$PWD}/tts"
mkdir -p "$OUTPUT_DIR"
FILENAME="tts_$(date +%Y%m%d_%H%M%S).mp3"
OUTPUT_PATH="$OUTPUT_DIR/$FILENAME"

# 完整 curl 命令示例
curl --location 'https://api.siliconflow.cn/v1/audio/speech' \
  --header "Authorization: Bearer $SILICONFLOW_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{...}' \
  --output "$OUTPUT_PATH"

文件命名规则：tts_YYYYMMDD_HHMMSS.mp3 示例：tts_20260321_143052.mp3

子 Agent 场景：

每个子 Agent 有独立工作区时，文件保存在该子 Agent 的工作区 tts/ 子目录
主 Agent 调度多个子 Agent 时，各自保存各自的音频，互不干扰

输出格式

🔊 语音生成完成
━━━━━━━━━━━━━━━━━━━━
模型：FunAudioLLM/CosyVoice2-0.5B
音色：claire（温柔女声）
格式：MP3 / 44100Hz
文本长度：约 X 字

已保存至：{工作区}/tts/tts_20260321_143052.mp3

播放命令：
  macOS:   afplay {上方路径}
  Linux:   mpv {上方路径} / ffplay {上方路径}
  通用:    vlc {上方路径}

常见错误处理

401  → API Key 未配置或失效，重新获取
400  → 参数错误：
       - input 含前后多余空格 → 去掉空格
       - MOSS-TTSD 缺少 [S1][S2] 标签 → 检查格式
       - voice 格式错误 → 检查音色名称
429  → 请求频率超限，稍等几秒重试
403  → 使用自定义音色但未完成实名认证

Usage Guidance

This skill appears coherent for converting text to speech via SiliconFlow. Before installing: 1) Verify you trust the provider (api.siliconflow.cn) because your API key will be sent to that domain. 2) Prefer exporting the API key in-session rather than permanently writing it to ~/.openclaw/.env if you want less persistence; if you do save it, protect the file (file permissions) and rotate/revoke the key when no longer needed. 3) Voice cloning requires uploading audio and real-name verification—do not upload recordings of other people without consent and avoid private/confidential audio. 4) Monitor API usage and billing on the provider side (Alipay/WeChat payments mentioned). 5) Because the skill's source/homepage are not provided, if you need stronger assurance, confirm the provider/site independently and consider creating a dedicated API key with limited scope for this skill.

Capability Analysis

Type: OpenClaw Skill Name: china-tts Version: 1.0.2 The 'china-tts' skill is a legitimate integration for the SiliconFlow Text-to-Speech API, providing instructions and examples for converting text to audio using models like CosyVoice2 and MOSS-TTSD. It uses standard curl commands to interact with the official 'api.siliconflow.cn' endpoint and requires a user-provided API key. No evidence of data exfiltration, malicious execution, or prompt injection was found across SKILL.md, references/examples.md, or references/voices.md.

Capability Assessment

✓ Purpose & Capability

Name/description, required binary (curl), and required env var (SILICONFLOW_API_KEY) all align with the SKILL.md examples that use curl to call api.siliconflow.cn. No unrelated services or credentials are requested.

ℹ Instruction Scope

Instructions are limited to calling the SiliconFlow API, uploading reference audio for voice cloning, and saving generated audio to the agent workspace. The SKILL.md suggests storing the API key via export or in ~/.openclaw/.env and references OPENCLAW_WORKSPACE (used as a fallback), which are reasonable but worth noting because they involve persisting credentials and relying on an environment variable not declared in requires.env.

✓ Install Mechanism

No install spec or external downloads — instruction-only skill that relies on curl being present. This is low-risk from an installation perspective.

ℹ Credentials

Only one credential is required (SILICONFLOW_API_KEY) and it's used as expected in Authorization headers. The SKILL.md also references OPENCLAW_WORKSPACE (optional fallback) and instructs writing the API key to ~/.openclaw/.env, which is practical but means the key will be stored on disk unless the user chooses an ephemeral export. Be aware of the privacy implications of storing keys and uploaded reference audio.

✓ Persistence & Privilege

always:false and user-invocable:true. The skill does not request persistent platform-wide privileges or modify other skills' configs. It only suggests writing files into the agent workspace or the user's ~/.openclaw/.env if the user follows the instructions.

Version History

v1.0.2

修复安全扫描问题：元数据格式改为单行JSON，声明SILICONFLOW_API_KEY

v1.0.1

- Added a section detailing the default audio file save path and naming convention (`tts/tts_YYYYMMDD_HHMMSS.mp3`) in the working directory or per-Agent workspace. - Provided example shell commands and directory setup for saving generated audio files. - Updated output format instructions to reflect new file save locations and playback commands. - Improved clarity for users regarding where output files are stored, especially in multi-Agent scenarios.

v1.0.0

Initial release of china-tts – domestic text-to-speech skill for China, based on SiliconFlow API. - Converts text to speech (TTS) in China without VPN; supports multiple languages and dialects, including emotion control and custom voice cloning. - Two supported models: CosyVoice2-0.5B (multilingual, emotion/dialect) and MOSS-TTSD-v0.5 (dual-speaker podcast style). - Eight built-in voices plus custom voices; adjust speed and gain. - Requires SiliconFlow API key, domestic payment supported (Alipay/WeChat). - Includes setup instructions, usage examples, API details, error handling, and playback tips.

Metadata

Slug china-tts

Version 1.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is China Tts?

国内可用的文本转语音技能，基于硅基流动（SiliconFlow）API。Use when the user wants to convert text to speech in China without VPN. Supports CosyVoice2-0.5B (multilingual, emotion c... It is an AI Agent Skill for Claude Code / OpenClaw, with 209 downloads so far.

How do I install China Tts?

Run "/install china-tts" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is China Tts free?

Yes, China Tts is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does China Tts support?

China Tts is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created China Tts?

It is built and maintained by ToBeWin (@tobewin); the current version is v1.0.2.

More Skills

China Tts