local-voice-reply
/install local-voice-reply
Local Voice Reply
Use this skill to turn text into a cloned/custom-voice audio reply and deliver it reliably to Feishu or Discord.
Structured skill definition
- Purpose: local low-latency voice replies in Opus/Ogg.
- Channels: Feishu + Discord.
- Default voice:
juno(reference file:voice/juno_ref.wav). - Custom voice modes:
- File-based: replace/update
voice/juno_ref.wav. - Registry-based: upload/register voices via
POST /voice/register, then call byvoice_name.
- File-based: replace/update
- Output:
.opus(Ogg container) under.openclaw/media/outbound/voice-server-v3/(orTARVIS_VOICE_OUTPUT_DIR). - Control scripts:
scripts/send_voice_reply.ps1(server API path)scripts/generate_cuda_voice.ps1(stable local CUDA generation path)
Server implementation is kept with the skill (not workspace root):
server/voice_server_v3.py(FastAPI routes)server/voice_engine.py(generation and cache engine)
Voice assets are also colocated with the skill:
voice/
Runtime requirements
ffmpegmust be installed and available onPATH(required for Opus encoding).- Python packages required by the server:
fastapiuvicornpython-multipartchatterbox-ttstorchtorchaudionumpy
- On first startup,
ChatterboxTTS.from_pretrained()may download model assets, so initial run can require network access and additional disk. - Optional env vars:
TARVIS_VOICE_OUTPUT_DIRto override where generated Opus files are written.TARVIS_VOICE_DEVICEto force device selection (cuda/gpu,mps, orcpu).
Persistence behavior
- Uploaded voice samples from
POST /voice/registerare persisted underserver/voices/. - Cache and registry data are persisted under
server/voice_cache/. - Generated Opus outputs are written under
.openclaw/media/outbound/voice-server-v3/by default (orTARVIS_VOICE_OUTPUT_DIRwhen set). POST /output/cleanuponly deletes staged.opusfiles inside the configured output directory and their.jsonsidecar files.
Use this workflow
- Ensure local v3.3 TTS server is running from this skill folder:
python -m uvicorn --app-dir server voice_server_v3:app --host 127.0.0.1 --port 8000
- Call
/speakwithtext(and optionalspeed,exaggeration,cfg).voice_namedefaults tojuno.
- Receive Opus directly from server (
audio/ogg) in Juno voice. - Save final media into allowed path:
C:\Users\hanli\.openclaw\media\outbound\
- Send with
messagetool:action=sendfilePath=\x3Callowed-path>asVoice=true- For Feishu:
channel=feishu - For Discord:
channel=discord
Voice customization guide
A) Replace default Juno reference
- Replace
voice/juno_ref.wavwith your target reference voice sample. - Keep sample clean (single speaker, low noise, clear pronunciation).
- Restart server and test with
voice_name=juno.
B) Register additional named voices
- Call
POST /voice/registerwith a reference sample and targetvoice_name. - Confirm registration under
server/voices/. - Generate with that
voice_namein/speakor/speak_stream.
Defaults
voice_name:junospeed:1.2- Output format: Opus in Ogg container from server
/speak(no post-conversion) - Discord compatibility: Ogg/Opus is supported and can be sent as voice/audio with
asVoice=true
Speed Improvements In This Version
- Caches model capability lookups once at startup.
- Uses
torch.inference_mode()during synthesis to reduce overhead. - Reuses phrase cache for both
/speakand/speak_stream. - Improves chunking behavior for long CJK text to avoid oversized chunks.
- Keeps latency metrics for benchmarking and tuning.
Common failure and fix
- Error:
LocalMediaAccessError ... path-not-allowed - Fix: copy the file into
.openclaw/media/outboundbefore sending.
Script
Use scripts/send_voice_reply.ps1 to generate Opus directly with defaults (voice_name=juno, speed=1.2).
It auto-selects /speak_stream for longer text (or when -Stream is passed) for better throughput.
For stable CUDA generation command patterns under stricter exec approval policies, use:
scripts/generate_cuda_voice.ps1 -Text "..."This keeps the outer command shape fixed soallow-alwaysis more reusable.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install local-voice-reply - 安装完成后,直接呼叫该 Skill 的名称或使用
/local-voice-reply触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
local-voice-reply 是什么?
Local OPUS/Ogg voice-reply pipeline for Feishu/Discord with structured voice customization. Default voice is Juno (`voice/juno_ref.wav`), with support for re... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 444 次。
如何安装 local-voice-reply?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-voice-reply」即可一键安装,无需额外配置。
local-voice-reply 是免费的吗?
是的,local-voice-reply 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
local-voice-reply 支持哪些平台?
local-voice-reply 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 local-voice-reply?
由 tenured-master-chef-607(@tenured-master-chef-607)开发并维护,当前版本 v3.3.3。