功能描述

iFlytek Voice Clone tts(声音复刻) — train a custom voice model from audio samples and synthesize speech with the cloned voice. Supports the full workflow: get tr...

使用说明 (SKILL.md)

ifly-voiceclone-tts

Name: ifly-voiceclone-tts
Author: qingzhe2020

Clone a voice from audio samples and synthesize speech with it, using iFlytek's Voice Clone (声音复刻) API. Two-phase workflow: train a voice model, then synthesize speech with it.

Setup

Create an app at 讯飞控制台 with 一句话声音复刻 service enabled

Set environment variables:

export IFLY_APP_ID="your_app_id"
export IFLY_API_KEY="your_api_key"
export IFLY_API_SECRET="your_api_secret"

Workflow

Phase 1: Train a Voice Model

Step 1 — Get training text

python3 scripts/voiceclone.py train get-text

This returns a list of text segments with segId. You need to record yourself reading one of these texts.

Step 2 — Create a training task

python3 scripts/voiceclone.py train create --name "MyVoice" --sex female --engine omni_v1

Returns task_id. Supported engines:

omni_v1 — Multi-style universal voice (recommended)

Gender: male/female (or 1/2).

Step 3 — Upload audio

# Local file:
python3 scripts/voiceclone.py train upload --task-id 12345 --audio recording.wav --text-id 5001 --seg-id 1

# URL:
python3 scripts/voiceclone.py train upload --task-id 12345 --audio-url "https://example.com/voice.wav" --text-id 5001 --seg-id 1

Audio requirements:

Format: WAV/MP3/M4A/PCM
Duration: match the training text (typically 3-60 seconds)
Quality: clear recording, minimal background noise

Step 4 — Submit for training

python3 scripts/voiceclone.py train submit --task-id 12345

Step 5 — Check status (poll until done)

python3 scripts/voiceclone.py train status --task-id 12345

When complete, returns the res_id (voice resource ID) needed for synthesis.

Quick one-shot training

python3 scripts/voiceclone.py train quick \
    --audio recording.wav \
    --name "MyVoice" \
    --sex female \
    --wait

This combines create → upload → submit → poll in one command. --wait polls every 30s until training completes and prints the res_id.

Phase 2: Synthesize Speech

# Basic synthesis
python3 scripts/voiceclone.py synth "你好，这是我的声音克隆。" --res-id YOUR_RES_ID

# With output file
python3 scripts/voiceclone.py synth "Hello world" --res-id YOUR_RES_ID --output hello.mp3

# From file
python3 scripts/voiceclone.py synth --file article.txt --res-id YOUR_RES_ID -o article.mp3

# From stdin
echo "测试语音合成" | python3 scripts/voiceclone.py synth --res-id YOUR_RES_ID

# Adjust parameters
python3 scripts/voiceclone.py synth "快一点" --res-id YOUR_RES_ID --speed 70 --volume 80

Train Subcommands

Command	Description
`train get-text`	Get training text segments
`train create`	Create a training task
`train upload`	Upload audio to a task
`train submit`	Submit task for training
`train status`	Check training status
`train quick`	One-shot: create + upload + submit

Synthesis Options

Flag	Default	Description
`--res-id`	(required)	Voice resource ID from training
`--output` / `-o`	`output.mp3`	Output audio file path
`--format`	`mp3`	Audio format: mp3, pcm, speex, opus
`--sample-rate`	`16000`	Sample rate: 8000, 16000, 24000
`--speed`	`50`	Speed 0–100 (50=normal)
`--volume`	`50`	Volume 0–100 (50=normal)
`--pitch`	`50`	Pitch 0–100 (50=normal)

Notes

Training API: HTTP REST at http://opentrain.xfyousheng.com/voice_train (MD5-based token auth)
Synthesis API: WebSocket at wss://cn-huabei-1.xf-yun.com/v1/private/voice_clone (HMAC-SHA256 URL auth)
vcn: always x6_clone for cloned voice synthesis
Engine omni_v1: multi-style universal voice, supports cn/en/jp/ko/ru
Training text: use get-text to find available text segments — you must record yourself reading the corresponding text
Training time: typically 2–10 minutes depending on load
No pip dependencies: uses pure Python stdlib (built-in WebSocket client)
Env vars: IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET
Output: prints absolute path of saved audio to stdout
API doc: https://www.xfyun.cn/doc/spark/voiceclone.html

常见错误码速查指南 ฅ⁽͑˙˙⁾ฅ

遇到错误先别慌～看看下面的错误码对照表就知道怎么办啦 ✧｡･ﾟ:*･

🎤 音色训练接口 - 常见错误码

错误码	哎呀！发生了什么？	怎么解决呢？
10000	token过期啦～时间到惹 (ˊᵕˋ)	检查一下token是不是过期了，去刷新一下token吧！
10001	缺少请求头参数哦 (⊙_⊙)	看看请求头有没有带`X-AppId`和`X-Token`，要加上去哦～
10015	这个训练任务不是你的呀 (›´ω`‹ )	这个任务不属于当前应用，检查一下appid对不对呢～
10016	appid无效啦～ (°°)	這個appid沒有被授權，聯繫訊飛大大們給你分配一個吧！
10017	未授权这个训练类型呢 (๑•́ ₃ •̀๑)	这个训练类型没权限，联系讯飞技术人员帮你开通吧～
10018	没有分配训练路数哦 (｡•́︿•̀｡)	训练路数授权不够用啦！联系讯飞业务员增加训练路数吧～
10019	appid授权已过期惹 (╥_╥)	授权到期啦！联系业务员看看能不能续期吧～
10020	IP地址没授权呢 (⊙﹏⊙)	你的IP地址不在白名单里，把IP给讯飞让他们加一下吧！
10021	没有分配训练次数哦 (´；ω；`)	训练次数用完了！联系讯飞爸爸增加次数吧～
20001	textId无效或训练文本是空的呀 (°°)	检查一下textId和textSegId对不对，可以用`train get-text`命令确认一下哦！
20002	textSegId无效啦 (⊙_⊙)	这个分段ID不存在呢，用`train get-text`看看有哪些有效的ID吧！
60000	训练任务不存在哦 (；ω；`)	看看taskId是不是填错了呀？检查一下再试试吧～
90001	请求非法啦 (°°)	按照接口协议检查一下请求结构对不对哦～
90002	请求参数不正确 (´；ω；`)	参数有问题的说...比如textId must not be blank这种，仔细看看错误提示吧！
99999	系统内部异常啦 (╥_╥)	这个比较复杂...请联系讯飞技术人员帮你排查一下吧！

💡 小贴士：如果是权限、授权相关的问题（10016-10021），基本上都需要联系讯飞官方处理哦～可以提交工单：https://console.xfyun.cn/workorder/commit

🎵 音频合成接口 - 常见错误码

错误码	哎呀！发生了什么？	怎么解决呢？
10009	输入数据非法啦 (⊙_⊙)	检查一下输入的数据格式对不对哦～
10010	授权数已满惹 (°°)	没有授权许可或数量用光啦！提交工单联系讯飞吧～
10019	session超时啦 (ˊᵕˋ)	检查一下数据发送完了有没有关闭连接呢～
10043	音频解码失败惹 (｡•́︿•̀｡)	检查`aue`参数！如果填的是speex，要确保音频真的是speex格式，并且分段压缩和帧大小要一致哦～
10114	session超时啦 (´；ω；`)	会话时间太长了，检查一下发送数据有没有超过60秒哦～
10139	参数错误啦 (⊙_⊙)	看看参数有没有写错呢～
10160	请求JSON格式非法 (°°)	检查一下发送的数据是不是合法的JSON格式呀～
10161	base64解码失败惹 (╥_╥)	检查一下数据有没有用base64编码哦～
10163	参数校验失败啦 (´；ω；`)	具体原因看详细描述吧～仔细对照接口文档看看哪里的问题呢？
10200	读取数据超时 (°°)	检查一下是不是累计10秒没发送数据又没关闭连接呀？
10222	上传数据超限或SSL问题 (⊙﹏⊙)	1. 检查一下上传的数据（文本、音频、图片等）有没有超过接口上限～ \x3Cbr/> 2. SSL证书问题的话，把log导出发到工单吧：https://console.xfyun.cn/workorder/commit
10223	LB找不到节点 (°°)	服务器内部问题，提交工单吧～
10313	appid和apikey不匹配 (⊙_⊙)	检查一下appid是不是正确合法的哦～
10317	版本非法啦 (°°)	版本号不对呢，提交工单联系技术人员处理吧！
10700	引擎异常 (´；ω；`)	按照报错原因对照开发文档检查输入输出，如果还是搞不定，提供sid和错误信息提交工单吧！
11200	功能未授权 (°°)	先检查appid对不对，确保appid下添加了相关服务哦！\x3Cbr/>• 看看总调用量是不是超了或到期了\x3Cbr/>• 确认功能授权情况\x3Cbr/>如果都没问题就联系商务人员吧～
11201	每日交互次数超限啦 (╥_╥)	次数用光啦！可以提交应用审核提额，或者联系商务购买企业级接口获得海量服务量哦～
11503	服务内部响应数据错误 (°°)	提交工单让讯飞大大们看看怎么回事吧！
11502	服务配置错误 (⊙_⊙)	这个是讯飞的问题，提交工单吧～
100001~100010	引擎调用错误 (´；ω；`)	请提供sid和错误信息，提交工单联系技术人员排查吧！

💡 超重要！ 错误码100001-100010可能是引擎层面的问题，提交工单时记得提供：

sid（请求会话ID）

完整的错误信息

复现步骤

这样技术人员才能快速帮你定位问题哦～ ✧٩(ˊᗜˋ*)و

🆘 遇到问题怎么办？

先看错误码：上面的表格基本上涵盖了常见错误，看看有没有对应的～ ๑•̀ㅂ•́)و✧
检查参数：很多错误都是参数写错导致的，对照接口文档仔细核对一下哦！
提交工单：如果表格里没有，或者搞不定，点击这里提交工单：https://console.xfyun.cn/workorder/commit
购买/升级服务：需要更多调用量或功能的话：
- 一句话声音复刻控制台
- 购买服务包

🎉 祝你开发顺利！ 如果有其他问题也可以随时问我哦～一起加油！(´▽`ʃ♡ƪ)

安全使用建议

This skill implements the claimed iFlytek voice-cloning workflow, but there are important warnings before you install or use it: - The skill requires API credentials (IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET) even though the registry metadata didn't list them. Don't provide keys unless you trust the code and owner. - The script talks to training endpoints over plain HTTP and disables TLS verification for the TTS WebSocket. That means your API keys and data could be intercepted on the network. Avoid using production/privileged keys; prefer a throwaway/test account and run only on a trusted network. - The code will fetch audio by URL if you pass --audio-url; that can cause network I/O from arbitrary hosts. Consider running in an isolated environment (VM/container) if you don't fully trust the source. - If you need to proceed: review the full script yourself (or ask the publisher to fix metadata), confirm endpoints are the official iFlytek endpoints, and ideally patch the code to use HTTPS for token/train endpoints and to enable proper certificate validation for the WebSocket. If you want, I can list the exact code lines that disable TLS verification and where the HTTP endpoints are used so you can request fixes or make the patch yourself.

功能分析

Type: OpenClaw Skill Name: ifly-voiceclone-tts Version: 1.0.0 The script `scripts/voiceclone.py` implements voice cloning but contains significant security vulnerabilities. It transmits sensitive API credentials and audio data over unencrypted HTTP (via `TRAIN_BASE_URL` and `AUTH_TOKEN_URL`) and explicitly disables SSL certificate verification (`ssl.CERT_NONE`) in its custom WebSocket implementation. While these flaws appear to be poor security practices rather than intentional malice, they expose the agent to credential theft and man-in-the-middle attacks.

能力评估

ℹ Purpose & Capability

The skill's name and description match the included script: it implements iFlytek voice training and TTS. However the published registry metadata lists no required environment variables or primary credential while both SKILL.md and the script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET. That metadata omission is an incoherence and should be corrected/clarified.

⚠ Instruction Scope

Runtime instructions are narrowly scoped to the voice-training/synthesis workflow, but the implementation sends authentication data to HTTP endpoints (TRAIN_BASE_URL and AUTH_TOKEN_URL are http://) and uses a WebSocket client that disables certificate validation (ssl.CERT_NONE and check_hostname=False). These make secret transmission and TLS integrity vulnerable to interception. The skill also allows uploading audio by URL (it may fetch arbitrary URLs) which is expected for upload but increases network exposure.

✓ Install Mechanism

No install spec; the skill is an instruction + a single Python script using only the stdlib. Nothing is downloaded or extracted at install time, so installation risk is low.

⚠ Credentials

The code and SKILL.md require three credentials (IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET). That is proportionate to calling iFlytek APIs, but the registry metadata does not declare them (mismatch). More importantly, those secrets are transmitted to HTTP endpoints and used in client-side signing — sending them to plaintext HTTP endpoints risks exposure. The script also prints/writes output audio files (expected).

✓ Persistence & Privilege

The skill does not request persistent/always-on privileges and does not modify other skills or system-wide config. Autonomous invocation is allowed (platform default) but not combined with other privilege escalations.

版本历史

v1.0.0

ifly-voiceclone-tts v1.0.0 - Initial release of the iFlytek Voice Clone TTS skill. - Supports end-to-end workflow: get training text, create task, upload audio, submit for training, poll results, and synthesize speech with the cloned voice. - Command-line interface for both training and synthesis phases, with detailed subcommands and options. - Uses only the Python standard library; no external dependencies required. - Includes detailed documentation and troubleshooting guidance, including error code reference tables.

元数据

Slug ifly-voiceclone-tts

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题