← 返回 Skills 市场
tenured-master-chef-607

local-voice-reply

作者 tenured-master-chef-607 · GitHub ↗ · v3.3.3 · MIT-0
cross-platform ✓ 安全检测通过
444
总下载
0
收藏
2
当前安装
7
版本数
在 OpenClaw 中安装
/install local-voice-reply
功能描述
Local OPUS/Ogg voice-reply pipeline for Feishu/Discord with structured voice customization. Default voice is Juno (`voice/juno_ref.wav`), with support for re...
使用说明 (SKILL.md)

Local Voice Reply

Use this skill to turn text into a cloned/custom-voice audio reply and deliver it reliably to Feishu or Discord.

Structured skill definition

  • Purpose: local low-latency voice replies in Opus/Ogg.
  • Channels: Feishu + Discord.
  • Default voice: juno (reference file: voice/juno_ref.wav).
  • Custom voice modes:
    1. File-based: replace/update voice/juno_ref.wav.
    2. Registry-based: upload/register voices via POST /voice/register, then call by voice_name.
  • Output: .opus (Ogg container) under .openclaw/media/outbound/voice-server-v3/ (or TARVIS_VOICE_OUTPUT_DIR).
  • Control scripts:
    • scripts/send_voice_reply.ps1 (server API path)
    • scripts/generate_cuda_voice.ps1 (stable local CUDA generation path)

Server implementation is kept with the skill (not workspace root):

  • server/voice_server_v3.py (FastAPI routes)
  • server/voice_engine.py (generation and cache engine)

Voice assets are also colocated with the skill:

  • voice/

Runtime requirements

  • ffmpeg must be installed and available on PATH (required for Opus encoding).
  • Python packages required by the server:
    • fastapi
    • uvicorn
    • python-multipart
    • chatterbox-tts
    • torch
    • torchaudio
    • numpy
  • On first startup, ChatterboxTTS.from_pretrained() may download model assets, so initial run can require network access and additional disk.
  • Optional env vars:
    • TARVIS_VOICE_OUTPUT_DIR to override where generated Opus files are written.
    • TARVIS_VOICE_DEVICE to force device selection (cuda/gpu, mps, or cpu).

Persistence behavior

  • Uploaded voice samples from POST /voice/register are persisted under server/voices/.
  • Cache and registry data are persisted under server/voice_cache/.
  • Generated Opus outputs are written under .openclaw/media/outbound/voice-server-v3/ by default (or TARVIS_VOICE_OUTPUT_DIR when set).
  • POST /output/cleanup only deletes staged .opus files inside the configured output directory and their .json sidecar files.

Use this workflow

  1. Ensure local v3.3 TTS server is running from this skill folder:
    • python -m uvicorn --app-dir server voice_server_v3:app --host 127.0.0.1 --port 8000
  2. Call /speak with text (and optional speed, exaggeration, cfg).
    • voice_name defaults to juno.
  3. Receive Opus directly from server (audio/ogg) in Juno voice.
  4. Save final media into allowed path:
    • C:\Users\hanli\.openclaw\media\outbound\
  5. Send with message tool:
    • action=send
    • filePath=\x3Callowed-path>
    • asVoice=true
    • For Feishu: channel=feishu
    • For Discord: channel=discord

Voice customization guide

A) Replace default Juno reference

  1. Replace voice/juno_ref.wav with your target reference voice sample.
  2. Keep sample clean (single speaker, low noise, clear pronunciation).
  3. Restart server and test with voice_name=juno.

B) Register additional named voices

  1. Call POST /voice/register with a reference sample and target voice_name.
  2. Confirm registration under server/voices/.
  3. Generate with that voice_name in /speak or /speak_stream.

Defaults

  • voice_name: juno
  • speed: 1.2
  • Output format: Opus in Ogg container from server /speak (no post-conversion)
  • Discord compatibility: Ogg/Opus is supported and can be sent as voice/audio with asVoice=true

Speed Improvements In This Version

  • Caches model capability lookups once at startup.
  • Uses torch.inference_mode() during synthesis to reduce overhead.
  • Reuses phrase cache for both /speak and /speak_stream.
  • Improves chunking behavior for long CJK text to avoid oversized chunks.
  • Keeps latency metrics for benchmarking and tuning.

Common failure and fix

  • Error: LocalMediaAccessError ... path-not-allowed
  • Fix: copy the file into .openclaw/media/outbound before sending.

Script

Use scripts/send_voice_reply.ps1 to generate Opus directly with defaults (voice_name=juno, speed=1.2). It auto-selects /speak_stream for longer text (or when -Stream is passed) for better throughput.

For stable CUDA generation command patterns under stricter exec approval policies, use:

  • scripts/generate_cuda_voice.ps1 -Text "..." This keeps the outer command shape fixed so allow-always is more reusable.
安全使用建议
This skill appears to be what it claims: a local TTS server that produces Opus/Ogg outputs. Before installing, be aware: (1) you must have ffmpeg on PATH and install heavyweight Python deps (torch, torchaudio, chatterbox-tts) — initial startup may download large model files and use significant disk and GPU/CPU resources; (2) uploaded voice samples and generated audio are persisted locally under the skill's folders and by default in ~/.openclaw/media/outbound — only register voice samples you trust; (3) SKILL.md mentions helper scripts that are not present in the bundle—confirm whether those scripts are provided separately or replaced by your own invocation; (4) the service can read files referenced by its manifest if that file is edited, so avoid placing sensitive files under the skill's voice/manifest paths. If you need network isolation, prevent ChatterboxTTS.from_pretrained() from downloading by pre-providing model artifacts or blocking outbound network during startup.
功能分析
Type: OpenClaw Skill Name: local-voice-reply Version: 3.3.3 The skill bundle provides a legitimate local text-to-speech (TTS) pipeline using FastAPI, ChatterboxTTS, and ffmpeg. The code in 'server/voice_engine.py' and 'server/voice_server_v3.py' is well-structured and includes security best practices such as path traversal guards (using .relative_to() checks) and safe subprocess execution (using argument lists instead of shell strings). While 'SKILL.md' contains a hardcoded local path ('C:\Users\hanli\...') in its instructions to the agent, this appears to be a developer artifact rather than a malicious injection. The high-risk capabilities (file system access and subprocess execution) are strictly aligned with the stated purpose of generating and managing audio files.
能力评估
Purpose & Capability
Name/description (local OPUS/Ogg voice replies for Feishu/Discord) aligns with included FastAPI server and TTS engine. Required tools (ffmpeg, Python libraries including torch/torchaudio/chatterbox-tts) are proportional to the stated functionality.
Instruction Scope
SKILL.md instructs running the local uvicorn server and calling /speak endpoints, saving outputs under .openclaw/media/outbound; those instructions are consistent with code. One small mismatch: SKILL.md references control scripts (scripts/send_voice_reply.ps1 and scripts/generate_cuda_voice.ps1) that are not present in the file manifest—this may be an omission or packaging error. The skill persists uploaded voices and cache data under its own server folders and writes outputs into the user's .openclaw media dir (or TARVIS_VOICE_OUTPUT_DIR).
Install Mechanism
No install spec (instruction-only) and the server code is bundled with the skill — low install risk. However runtime requires large Python packages (torch/torchaudio/chatterbox-tts) and ffmpeg; ChatterboxTTS.from_pretrained() may download model artifacts over the network on first run, which is expected but can be large.
Credentials
No required credentials or secret env vars. Optional env vars (TARVIS_VOICE_OUTPUT_DIR, TARVIS_VOICE_DEVICE, TARVIS_VOICE_FFMPEG_TIMEOUT_SEC, TARVIS_VOICE_PHRASE_RAM_CACHE_ITEMS) are relevant to operation and proportionate. The code reads only these environment variables (plus standard log-level).
Persistence & Privilege
The skill persists uploaded voice samples under server/voices/, caching under server/voice_cache/, and writes generated .opus files to the configured outputs directory (default: ~/.openclaw/media/outbound/voice-server-v3). It does not request always:true or global privileges; persistence is limited to its own directories and the configured output path.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install local-voice-reply
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /local-voice-reply 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v3.3.3
docs: structured skill definition + explicit voice customization guide (juno_ref replacement and /voice/register flow)
v3.3.2
docs: rewrite skill description; clarify local low-latency voice customization via juno_ref/registered voices
v1.0.2
docs: clarify juno_ref.wav voice customization support in skill description
v3.3.1
Add MIT LICENSE and keep v3.3 voice-server structure.
v3.3.0
v3.3: split voice engine module, improved caching + inference mode, added CUDA generation script, and refreshed docs.
v1.0.1
- Updated skill description for clarity and user focus. - Clarified that OPUS files are generated to match Feishu audio requirements. - No functional or workflow changes; documentation improvements only.
v1.0.0
Summary: Initial release with an optimized workflow for generating and sending cloned-voice Feishu audio replies using a local TTS server. - Introduces local Chatterbox TTS server integration with direct Opus output and structured FastAPI endpoints. - Implements efficient caching for model lookups and repeated phrase synthesis, reducing response times. - Enhances handling of long CJK text via improved chunking and stream support. - Provides a PowerShell script for automatic voice reply generation and selection of streaming for long texts. - Voice files are stored in the skill directory; correct media paths are enforced for Feishu compatibility.
元数据
Slug local-voice-reply
版本 3.3.3
许可证 MIT-0
累计安装 2
当前安装数 2
历史版本数 7
常见问题

local-voice-reply 是什么?

Local OPUS/Ogg voice-reply pipeline for Feishu/Discord with structured voice customization. Default voice is Juno (`voice/juno_ref.wav`), with support for re... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 444 次。

如何安装 local-voice-reply?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-voice-reply」即可一键安装,无需额外配置。

local-voice-reply 是免费的吗?

是的,local-voice-reply 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

local-voice-reply 支持哪些平台?

local-voice-reply 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 local-voice-reply?

由 tenured-master-chef-607(@tenured-master-chef-607)开发并维护,当前版本 v3.3.3。

💬 留言讨论