← 返回 Skills 市场
qingzhe2020

ifly-voiceclone-tts

作者 Iflytek AIcloud · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
208
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install ifly-voiceclone-tts
功能描述
iFlytek Voice Clone tts(声音复刻) — train a custom voice model from audio samples and synthesize speech with the cloned voice. Supports the full workflow: get tr...
使用说明 (SKILL.md)

ifly-voiceclone-tts

Clone a voice from audio samples and synthesize speech with it, using iFlytek's Voice Clone (声音复刻) API. Two-phase workflow: train a voice model, then synthesize speech with it.

Setup

  1. Create an app at 讯飞控制台 with 一句话声音复刻 service enabled
  2. Set environment variables:
    export IFLY_APP_ID="your_app_id"
    export IFLY_API_KEY="your_api_key"
    export IFLY_API_SECRET="your_api_secret"
    

Workflow

Phase 1: Train a Voice Model

Step 1 — Get training text

python3 scripts/voiceclone.py train get-text

This returns a list of text segments with segId. You need to record yourself reading one of these texts.

Step 2 — Create a training task

python3 scripts/voiceclone.py train create --name "MyVoice" --sex female --engine omni_v1

Returns task_id. Supported engines:

  • omni_v1 — Multi-style universal voice (recommended)

Gender: male/female (or 1/2).

Step 3 — Upload audio

# Local file:
python3 scripts/voiceclone.py train upload --task-id 12345 --audio recording.wav --text-id 5001 --seg-id 1

# URL:
python3 scripts/voiceclone.py train upload --task-id 12345 --audio-url "https://example.com/voice.wav" --text-id 5001 --seg-id 1

Audio requirements:

  • Format: WAV/MP3/M4A/PCM
  • Duration: match the training text (typically 3-60 seconds)
  • Quality: clear recording, minimal background noise

Step 4 — Submit for training

python3 scripts/voiceclone.py train submit --task-id 12345

Step 5 — Check status (poll until done)

python3 scripts/voiceclone.py train status --task-id 12345

When complete, returns the res_id (voice resource ID) needed for synthesis.

Quick one-shot training

python3 scripts/voiceclone.py train quick \
    --audio recording.wav \
    --name "MyVoice" \
    --sex female \
    --wait

This combines create → upload → submit → poll in one command. --wait polls every 30s until training completes and prints the res_id.

Phase 2: Synthesize Speech

# Basic synthesis
python3 scripts/voiceclone.py synth "你好,这是我的声音克隆。" --res-id YOUR_RES_ID

# With output file
python3 scripts/voiceclone.py synth "Hello world" --res-id YOUR_RES_ID --output hello.mp3

# From file
python3 scripts/voiceclone.py synth --file article.txt --res-id YOUR_RES_ID -o article.mp3

# From stdin
echo "测试语音合成" | python3 scripts/voiceclone.py synth --res-id YOUR_RES_ID

# Adjust parameters
python3 scripts/voiceclone.py synth "快一点" --res-id YOUR_RES_ID --speed 70 --volume 80

Train Subcommands

Command Description
train get-text Get training text segments
train create Create a training task
train upload Upload audio to a task
train submit Submit task for training
train status Check training status
train quick One-shot: create + upload + submit

Synthesis Options

Flag Default Description
--res-id (required) Voice resource ID from training
--output / -o output.mp3 Output audio file path
--format mp3 Audio format: mp3, pcm, speex, opus
--sample-rate 16000 Sample rate: 8000, 16000, 24000
--speed 50 Speed 0–100 (50=normal)
--volume 50 Volume 0–100 (50=normal)
--pitch 50 Pitch 0–100 (50=normal)

Notes

  • Training API: HTTP REST at http://opentrain.xfyousheng.com/voice_train (MD5-based token auth)
  • Synthesis API: WebSocket at wss://cn-huabei-1.xf-yun.com/v1/private/voice_clone (HMAC-SHA256 URL auth)
  • vcn: always x6_clone for cloned voice synthesis
  • Engine omni_v1: multi-style universal voice, supports cn/en/jp/ko/ru
  • Training text: use get-text to find available text segments — you must record yourself reading the corresponding text
  • Training time: typically 2–10 minutes depending on load
  • No pip dependencies: uses pure Python stdlib (built-in WebSocket client)
  • Env vars: IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET
  • Output: prints absolute path of saved audio to stdout
  • API doc: https://www.xfyun.cn/doc/spark/voiceclone.html

常见错误码速查指南 ฅ⁽͑˙˙⁾ฅ

遇到错误先别慌~看看下面的错误码对照表就知道怎么办啦 ✧。・゚:*・

🎤 音色训练接口 - 常见错误码

错误码 哎呀!发生了什么? 怎么解决呢?
10000 token过期啦~时间到惹 (ˊᵕˋ) 检查一下token是不是过期了,去刷新一下token吧!
10001 缺少请求头参数哦 (⊙_⊙) 看看请求头有没有带X-AppIdX-Token,要加上去哦~
10015 这个训练任务不是你的呀 (›´ω`‹ ) 这个任务不属于当前应用,检查一下appid对不对呢~
10016 appid无效啦~ (°°) 這個appid沒有被授權,聯繫訊飛大大們給你分配一個吧!
10017 未授权这个训练类型呢 (๑•́ ₃ •̀๑) 这个训练类型没权限,联系讯飞技术人员帮你开通吧~
10018 没有分配训练路数哦 (。•́︿•̀。) 训练路数授权不够用啦!联系讯飞业务员增加训练路数吧~
10019 appid授权已过期惹 (╥_╥) 授权到期啦!联系业务员看看能不能续期吧~
10020 IP地址没授权呢 (⊙﹏⊙) 你的IP地址不在白名单里,把IP给讯飞让他们加一下吧!
10021 没有分配训练次数哦 (´;ω;`) 训练次数用完了!联系讯飞爸爸增加次数吧~
20001 textId无效或训练文本是空的呀 (°°) 检查一下textId和textSegId对不对,可以用train get-text命令确认一下哦!
20002 textSegId无效啦 (⊙_⊙) 这个分段ID不存在呢,用train get-text看看有哪些有效的ID吧!
60000 训练任务不存在哦 (;ω;`) 看看taskId是不是填错了呀?检查一下再试试吧~
90001 请求非法啦 (°°) 按照接口协议检查一下请求结构对不对哦~
90002 请求参数不正确 (´;ω;`) 参数有问题的说...比如textId must not be blank这种,仔细看看错误提示吧!
99999 系统内部异常啦 (╥_╥) 这个比较复杂...请联系讯飞技术人员帮你排查一下吧!

💡 小贴士:如果是权限、授权相关的问题(10016-10021),基本上都需要联系讯飞官方处理哦~可以提交工单:https://console.xfyun.cn/workorder/commit


🎵 音频合成接口 - 常见错误码

错误码 哎呀!发生了什么? 怎么解决呢?
10009 输入数据非法啦 (⊙_⊙) 检查一下输入的数据格式对不对哦~
10010 授权数已满惹 (°°) 没有授权许可或数量用光啦!提交工单联系讯飞吧~
10019 session超时啦 (ˊᵕˋ) 检查一下数据发送完了有没有关闭连接呢~
10043 音频解码失败惹 (。•́︿•̀。) 检查aue参数!如果填的是speex,要确保音频真的是speex格式,并且分段压缩和帧大小要一致哦~
10114 session超时啦 (´;ω;`) 会话时间太长了,检查一下发送数据有没有超过60秒哦~
10139 参数错误啦 (⊙_⊙) 看看参数有没有写错呢~
10160 请求JSON格式非法 (°°) 检查一下发送的数据是不是合法的JSON格式呀~
10161 base64解码失败惹 (╥_╥) 检查一下数据有没有用base64编码哦~
10163 参数校验失败啦 (´;ω;`) 具体原因看详细描述吧~仔细对照接口文档看看哪里的问题呢?
10200 读取数据超时 (°°) 检查一下是不是累计10秒没发送数据又没关闭连接呀?
10222 上传数据超限或SSL问题 (⊙﹏⊙) 1. 检查一下上传的数据(文本、音频、图片等)有没有超过接口上限~ \x3Cbr/> 2. SSL证书问题的话,把log导出发到工单吧:https://console.xfyun.cn/workorder/commit
10223 LB找不到节点 (°°) 服务器内部问题,提交工单吧~
10313 appid和apikey不匹配 (⊙_⊙) 检查一下appid是不是正确合法的哦~
10317 版本非法啦 (°°) 版本号不对呢,提交工单联系技术人员处理吧!
10700 引擎异常 (´;ω;`) 按照报错原因对照开发文档检查输入输出,如果还是搞不定,提供sid和错误信息提交工单吧!
11200 功能未授权 (°°) 先检查appid对不对,确保appid下添加了相关服务哦!\x3Cbr/>• 看看总调用量是不是超了或到期了\x3Cbr/>• 确认功能授权情况\x3Cbr/>如果都没问题就联系商务人员吧~
11201 每日交互次数超限啦 (╥_╥) 次数用光啦!可以提交应用审核提额,或者联系商务购买企业级接口获得海量服务量哦~
11503 服务内部响应数据错误 (°°) 提交工单让讯飞大大们看看怎么回事吧!
11502 服务配置错误 (⊙_⊙) 这个是讯飞的问题,提交工单吧~
100001~100010 引擎调用错误 (´;ω;`) 请提供sid和错误信息,提交工单联系技术人员排查吧!

💡 超重要! 错误码100001-100010可能是引擎层面的问题,提交工单时记得提供:

  • sid(请求会话ID)
  • 完整的错误信息
  • 复现步骤

这样技术人员才能快速帮你定位问题哦~ ✧٩(ˊᗜˋ*)و


🆘 遇到问题怎么办?

  1. 先看错误码:上面的表格基本上涵盖了常见错误,看看有没有对应的~ ๑•̀ㅂ•́)و✧
  2. 检查参数:很多错误都是参数写错导致的,对照接口文档仔细核对一下哦!
  3. 提交工单:如果表格里没有,或者搞不定,点击这里提交工单:https://console.xfyun.cn/workorder/commit
  4. 购买/升级服务:需要更多调用量或功能的话:

🎉 祝你开发顺利! 如果有其他问题也可以随时问我哦~ 一起加油!(´▽`ʃ♡ƪ)

安全使用建议
This skill implements the claimed iFlytek voice-cloning workflow, but there are important warnings before you install or use it: - The skill requires API credentials (IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET) even though the registry metadata didn't list them. Don't provide keys unless you trust the code and owner. - The script talks to training endpoints over plain HTTP and disables TLS verification for the TTS WebSocket. That means your API keys and data could be intercepted on the network. Avoid using production/privileged keys; prefer a throwaway/test account and run only on a trusted network. - The code will fetch audio by URL if you pass --audio-url; that can cause network I/O from arbitrary hosts. Consider running in an isolated environment (VM/container) if you don't fully trust the source. - If you need to proceed: review the full script yourself (or ask the publisher to fix metadata), confirm endpoints are the official iFlytek endpoints, and ideally patch the code to use HTTPS for token/train endpoints and to enable proper certificate validation for the WebSocket. If you want, I can list the exact code lines that disable TLS verification and where the HTTP endpoints are used so you can request fixes or make the patch yourself.
功能分析
Type: OpenClaw Skill Name: ifly-voiceclone-tts Version: 1.0.0 The script `scripts/voiceclone.py` implements voice cloning but contains significant security vulnerabilities. It transmits sensitive API credentials and audio data over unencrypted HTTP (via `TRAIN_BASE_URL` and `AUTH_TOKEN_URL`) and explicitly disables SSL certificate verification (`ssl.CERT_NONE`) in its custom WebSocket implementation. While these flaws appear to be poor security practices rather than intentional malice, they expose the agent to credential theft and man-in-the-middle attacks.
能力评估
Purpose & Capability
The skill's name and description match the included script: it implements iFlytek voice training and TTS. However the published registry metadata lists no required environment variables or primary credential while both SKILL.md and the script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET. That metadata omission is an incoherence and should be corrected/clarified.
Instruction Scope
Runtime instructions are narrowly scoped to the voice-training/synthesis workflow, but the implementation sends authentication data to HTTP endpoints (TRAIN_BASE_URL and AUTH_TOKEN_URL are http://) and uses a WebSocket client that disables certificate validation (ssl.CERT_NONE and check_hostname=False). These make secret transmission and TLS integrity vulnerable to interception. The skill also allows uploading audio by URL (it may fetch arbitrary URLs) which is expected for upload but increases network exposure.
Install Mechanism
No install spec; the skill is an instruction + a single Python script using only the stdlib. Nothing is downloaded or extracted at install time, so installation risk is low.
Credentials
The code and SKILL.md require three credentials (IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET). That is proportionate to calling iFlytek APIs, but the registry metadata does not declare them (mismatch). More importantly, those secrets are transmitted to HTTP endpoints and used in client-side signing — sending them to plaintext HTTP endpoints risks exposure. The script also prints/writes output audio files (expected).
Persistence & Privilege
The skill does not request persistent/always-on privileges and does not modify other skills or system-wide config. Autonomous invocation is allowed (platform default) but not combined with other privilege escalations.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ifly-voiceclone-tts
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ifly-voiceclone-tts 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
ifly-voiceclone-tts v1.0.0 - Initial release of the iFlytek Voice Clone TTS skill. - Supports end-to-end workflow: get training text, create task, upload audio, submit for training, poll results, and synthesize speech with the cloned voice. - Command-line interface for both training and synthesis phases, with detailed subcommands and options. - Uses only the Python standard library; no external dependencies required. - Includes detailed documentation and troubleshooting guidance, including error code reference tables.
元数据
Slug ifly-voiceclone-tts
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

ifly-voiceclone-tts 是什么?

iFlytek Voice Clone tts(声音复刻) — train a custom voice model from audio samples and synthesize speech with the cloned voice. Supports the full workflow: get tr... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 208 次。

如何安装 ifly-voiceclone-tts?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ifly-voiceclone-tts」即可一键安装,无需额外配置。

ifly-voiceclone-tts 是免费的吗?

是的,ifly-voiceclone-tts 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ifly-voiceclone-tts 支持哪些平台?

ifly-voiceclone-tts 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ifly-voiceclone-tts?

由 Iflytek AIcloud(@qingzhe2020)开发并维护,当前版本 v1.0.0。

💬 留言讨论