Local Voice Agent
/install local-voice-agent
Voice Agent - OpenClaw Skill
Complete voice-to-voice AI assistant for hands-free operation.
Architecture
User Voice → Whisper STT → Text → OpenClaw AI → Text → Pocket-TTS → Voice Response
Prerequisites
1. Whisper.cpp (Speech-to-Text)
# Clone and build
git clone https://github.com/ggerganov/whisper.cpp ~/.local/whisper.cpp
cd ~/.local/whisper.cpp
make -j4
# Download tiny model (fast, low-resource)
bash ./models/download-ggml-model.sh tiny
Test:
./build/bin/whisper-cli -m models/ggml-tiny.bin -f samples/jfk.wav
2. Pocket-TTS (Text-to-Speech)
Option A: Use existing server
export POCKET_TTS_URL="http://localhost:5000"
Option B: Install locally
# Clone your Pocket-TTS server
cd /path/to/pockettts
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 -m app.main --host 0.0.0.0 --port 5000
3. FFmpeg (Audio Conversion)
sudo apt-get install -y ffmpeg
Quick Start
Voice Command (One-shot)
# Record → Transcribe → Process → Speak
./bin/voice-agent "What's the weather today?"
Interactive Mode
# Continuous voice conversation
./bin/voice-agent --interactive
Voice File Processing
# Transcribe existing audio file
./bin/voice-to-text recording.wav
# Generate voice from text
./bin/text-to-voice "Hello world!" output.wav
Configuration
Edit config/voices.yaml:
# Default voices
stt:
model: tiny # tiny, small, medium (larger = more accurate, slower)
language: en # en, ne, hi, etc.
tts:
url: http://localhost:5000
voice: peter voice # Your custom voice
format: wav # wav, mp3
# Performance
performance:
threads: 4 # CPU threads for Whisper
realtime: true # Faster-than-realtime processing
API Endpoints
POST /v1/voice/command
Voice command processing:
curl -X POST "http://localhost:5000/v1/voice/command" \
-F "[email protected]" \
-F "action=openclaw"
Response:
{
"transcription": "What's the weather today?",
"response_text": "The weather in Kathmandu is partly cloudy, 22 degrees Celsius.",
"audio_response": "/tmp/response.wav"
}
GET /v1/voices
List available TTS voices:
curl http://localhost:5000/v1/voices
Use Cases
1. Daily Briefings (Voice)
./bin/voice-agent "Give me my morning briefing"
2. Voice Notes
./bin/voice-agent "Remind me to call Peter at 3 PM"
3. Hands-Free Coding
./bin/voice-agent "Show me the status of my git repository"
4. Accessibility
Perfect for users who prefer voice interaction or have mobility constraints.
Scripts
bin/voice-to-text
Convert speech to text:
./bin/voice-to-text input.wav
./bin/voice-to-text input.ogg # Auto-converts with ffmpeg
./bin/voice-to-text input.mp4 # Extracts audio from video
bin/text-to-voice
Convert text to speech:
./bin/text-to-voice "Hello world!" output.wav
./bin/text-to-voice --voice "usha lama" "Namaste!" greeting.wav
bin/voice-agent
Full voice pipeline:
./bin/voice-agent "What time is it?"
./bin/voice-agent --interactive # Conversation mode
./bin/voice-agent --file recording.wav # Process file
Troubleshooting
Whisper.cpp Errors
"failed to read audio file"
- Convert to WAV first:
ffmpeg -i input.ogg -ar 16000 -ac 1 output.wav
"model not found"
- Download model:
bash models/download-ggml-model.sh tiny
Pocket-TTS Errors
"Connection refused"
- Start TTS server:
python3 -m app.main - Check URL:
export POCKET_TTS_URL="http://localhost:5000"
"Voice not found"
- List voices:
curl http://localhost:5000/v1/voices - Clone custom voice if needed
Performance Issues
Slow transcription
- Use smaller model:
tinyinstead ofsmall - Reduce audio sample rate:
ffmpeg -i input.wav -ar 16000 output.wav
Slow TTS
- Use shorter text
- Generate in background
Examples
See examples/ directory for:
morning-briefing.sh- Automated voice briefingvoice-reminder.sh- Voice-based remindersconversation-mode.sh- Interactive voice chat
Performance
| Model | RAM | Speed (1 min audio) | Accuracy |
|---|---|---|---|
| tiny | 500MB | ~30 sec | ~90% |
| small | 1GB | ~60 sec | ~95% |
| medium | 2GB | ~120 sec | ~98% |
Recommendation: Start with tiny, upgrade to small if needed.
License
MIT License - See LICENSE file
Credits
- Whisper.cpp by Georgi Gerganov (ggerganov/whisper.cpp)
- Pocket-TTS by Kyutai Labs (kyutai-labs/pocket-tts)
- OpenClaw by OpenClaw Team (openclaw/openclaw)
Support
- GitHub Issues: [Your Repo Link]
- OpenClaw Discord: https://discord.com/invite/clawd
- Documentation: [Your Docs Link]
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install local-voice-agent - 安装完成后,直接呼叫该 Skill 的名称或使用
/local-voice-agent触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Local Voice Agent 是什么?
Complete offline voice-to-voice AI assistant for OpenClaw (Whisper.cpp STT + Pocket-TTS). 100% local processing, no cloud APIs, no costs. Use for hands-free... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 144 次。
如何安装 Local Voice Agent?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-voice-agent」即可一键安装,无需额外配置。
Local Voice Agent 是免费的吗?
是的,Local Voice Agent 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Local Voice Agent 支持哪些平台?
Local Voice Agent 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Local Voice Agent?
由 Pinological(@pinological)开发并维护,当前版本 v1.0.2。