Edge TTS Voice System
/install edge-tts-voice-system
Edge TTS Voice System
A complete, privacy-focused voice system for OpenClaw that works entirely offline. No internet required, no data leaves your machine.
Features
- Outbound replies: Edge TTS with cached audio output
- Accurate STT: faster-whisper base model for speech recognition
- Fully offline: No internet connection required
- Privacy-focused: All processing happens locally
- Easy integration: Ready-to-use Python and bash scripts
- Voice conversations: Natural back-and-forth voice interactions
Quick Start
Installation
# Install the skill
clawhub install lessac_offline_voice_system
# Or manually from this directory
./scripts/install.sh
Basic Usage
from scripts.voice_handler import VoiceHandler
handler = VoiceHandler()
# Transcribe audio to text
text = handler.audio_to_text("voice_message.ogg")
print(f"You said: {text}")
# Generate voice response
audio_file = handler.text_to_audio("Hello, this is a voice response.")
Command Line
# Transcribe audio
./scripts/voice_integration.sh transcribe voice_message.ogg
# Generate TTS
./scripts/voice_integration.sh tts "Hello world" output.wav
# Full voice processing
./scripts/voice_integration.sh process voice_message.ogg
Components
1. Text-to-Speech (TTS)
- Voice: Edge-supported voice (default
en-IE-ConnorNeural) - Library: Edge TTS (
edge-tts) - Quality: Natural speech with cached output
- Sample rate: provider-defined
2. Speech-to-Text (STT)
- Model: faster-whisper base
- Accuracy: High, comparable to cloud services
- Languages: Multi-language support (auto-detected)
- Speed: ~2 seconds for typical audio
3. Audio Processing
- Formats: OGG/Opus, WAV, MP3 (via ffmpeg)
- Conversion: Automatic format handling
- Quality: 16kHz mono for optimal recognition
Performance
- TTS Load time: ~2 seconds (one-time)
- TTS Generation: ~3-4 seconds
- STT Transcription: ~2 seconds
- Total response time: 5-7 seconds
Integration with OpenClaw
Automatic Voice Processing
When installed, the skill can be configured to automatically:
- Detect incoming voice messages
- Transcribe them silently
- Generate AI responses
- Convert responses to voice
- Send voice replies back
OpenClaw reply TTS configuration
The built-in OpenClaw reply TTS path is not the local voice pipeline used by this skill.
This skill now uses a local Edge TTS reply path instead, with cached output
stored under /root/.openclaw/tts/cache.
Default outbound voice:
en-IE-ConnorNeural
Relevant files:
tts_edge_wrapper.pyvoice_handler.pyvoice_integration.shscripts/install.sh
If you need to change the voice, set:
export OPENCLAW_EDGE_TTS_VOICE="en-IE-ConnorNeural"
or replace it with another Edge-supported voice.
Reinstall after OpenClaw updates
After an OpenClaw system update, rerun the installer to restore the voice stack:
cd /root/.openclaw/workspace/skills/lessac_offline_voice_system
./scripts/install.sh
This refreshes:
- the Python venv dependencies (
faster-whisper,edge-tts,soundfile) - the runtime cache directory
- the local voice wrappers
- the config file under
/root/.openclaw/tts/config.json
Manual Integration
# In your OpenClaw agent or custom script
import sys
sys.path.append("/path/to/skill/scripts")
from voice_handler import VoiceHandler
class YourAgent:
def __init__(self):
self.voice = VoiceHandler()
def handle_voice_message(self, audio_file):
# Transcribe
text = self.voice.audio_to_text(audio_file)
# Generate response (your AI logic here)
response = self.generate_response(text)
# Convert to voice
voice_response = self.voice.text_to_audio(response)
return voice_response
Configuration
Voice Model Selection
The skill uses Edge TTS by default. To use a different voice:
- Set
OPENCLAW_EDGE_TTS_VOICEto a supported Edge voice - Re-run the installer to refresh the cache and wrappers
STT Model Selection
Change the faster-whisper model size in scripts/voice_handler.py:
"tiny": Fastest, lower accuracy"base": Default, good balance"small": Higher accuracy, slower"medium": Best accuracy, slowest
Troubleshooting
Common Issues
-
"No module named 'piper'"
pip install piper-tts -
"ffmpeg not found"
sudo apt-get install ffmpeg -
Out of memory with large models
- Use
"tiny"or"base"STT model - Use a different Edge voice if needed
- Use
-
Slow TTS generation
- First generation loads model (~2s)
- Subsequent generations are faster (~0.3s per sentence)
Debug Mode
Enable debug output:
export VOICE_DEBUG=1
./scripts/voice_integration.sh process audio.ogg
Files
scripts/install.sh- Installation scriptscripts/voice_handler.py- Main Python handlerscripts/piper_tts.py- Edge TTS wrapperscripts/voice_integration.sh- Bash interfacereferences/voice_models.md- Voice model informationassets/- Voice model files (downloaded during install)
Dependencies
- Python 3.8+
- ffmpeg
- Python packages (installed automatically):
- faster-whisper
- piper-tts
- soundfile
License
Open source. See included LICENSE file.
Support
For issues or questions:
- Check the troubleshooting section
- Review the references/ directory
- Open an issue on the skill repository
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install edge-tts-voice-system - 安装完成后,直接呼叫该 Skill 的名称或使用
/edge-tts-voice-system触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Edge TTS Voice System 是什么?
Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 100 次。
如何安装 Edge TTS Voice System?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install edge-tts-voice-system」即可一键安装,无需额外配置。
Edge TTS Voice System 是免费的吗?
是的,Edge TTS Voice System 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Edge TTS Voice System 支持哪些平台?
Edge TTS Voice System 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Edge TTS Voice System?
由 Stephen Redmond - Straitéis AI(@stephenredmond-straiteis)开发并维护,当前版本 v2.1.0。