/install gemini-live-phone
Gemini Live Phone Bridge
Real-time voice AI over phone calls using Google Gemini's native audio capabilities.
Architecture
Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)
Quick Start
# Set required env vars
export GOOGLE_API_KEY="your-key"
export TWILIO_AUTH_TOKEN="your-token"
# Run the bridge
python scripts/bridge.py --port 3335
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/gemini-live/status |
GET | Health check + active calls |
/gemini-live/incoming |
POST | TwiML for inbound calls (Twilio webhook) |
/gemini-live/stream |
WS | Twilio Media Stream WebSocket |
/gemini-live/call |
POST | Initiate outbound call |
/gemini-live/twiml |
POST | TwiML for outbound calls |
/gemini-live/call-status |
POST | Twilio call status webhook |
Outbound Call API
curl -X POST https://your-domain/gemini-live/call \
-H 'Content-Type: application/json' \
-d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'
Configuration
All settings via CLI args or environment variables:
Core
--model— Gemini model (default:gemini-2.5-flash-native-audio-latest)--voice— Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)--from-number— Twilio outbound number (default: envTWILIO_FROM)--system-prompt— AI persona system prompt--max-duration— Max call seconds (default: 300)
VAD (Voice Activity Detection)
--vad-enabled/--no-vad— Toggle server-side VAD (default: on)--vad-silence-ms— Silence duration to trigger activityEnd (default: 500)--vad-energy-threshold— RMS energy threshold (default: 0.01)--vad-speech-min-ms— Min speech duration before activityStart (default: 100)
Echo Suppression
--echo-multiplier— VAD threshold multiplier during agent speech (default: 3.0)--echo-decay-ms— Decay time after agent stops speaking (default: 300)
Twilio Setup
- Buy a phone number on Twilio
- Set Voice webhook:
https://your-domain/gemini-live/incoming(HTTP POST) - Set Call status URL:
https://your-domain/gemini-live/call-status(HTTP POST) - Ensure geo-permissions are enabled for target countries
Network Requirements
The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.
# Caddy config example
handle /gemini-live/* {
reverse_proxy localhost:3335 {
flush_interval -1
transport http {
read_timeout 0
write_timeout 0
}
}
}
Performance
Latency benchmarks (Gemini 2.5 Flash Native Audio):
| Config | Median | Min | Max |
|---|---|---|---|
| No VAD, 200ms buffer | 3,660ms | 2,360ms | 5,180ms |
| Server VAD, 50ms buffer | 2,500ms | 2,080ms | 6,980ms |
Server-side VAD reduces median latency by ~32%.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install gemini-live-phone - 安装完成后,直接呼叫该 Skill 的名称或使用
/gemini-live-phone触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Gemini Live Phone 是什么?
Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 294 次。
如何安装 Gemini Live Phone?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install gemini-live-phone」即可一键安装,无需额外配置。
Gemini Live Phone 是免费的吗?
是的,Gemini Live Phone 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Gemini Live Phone 支持哪些平台?
Gemini Live Phone 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Gemini Live Phone?
由 ABFS Tech(@quantdeveloperusa)开发并维护,当前版本 v1.0.1。