← 返回 Skills 市场
quantdeveloperusa

Gemini Live Phone

作者 ABFS Tech · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
294
总下载
0
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install gemini-live-phone
功能描述
Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.
使用说明 (SKILL.md)

Gemini Live Phone Bridge

Real-time voice AI over phone calls using Google Gemini's native audio capabilities.

Architecture

Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)

Quick Start

# Set required env vars
export GOOGLE_API_KEY="your-key"
export TWILIO_AUTH_TOKEN="your-token"

# Run the bridge
python scripts/bridge.py --port 3335

Endpoints

Endpoint Method Description
/gemini-live/status GET Health check + active calls
/gemini-live/incoming POST TwiML for inbound calls (Twilio webhook)
/gemini-live/stream WS Twilio Media Stream WebSocket
/gemini-live/call POST Initiate outbound call
/gemini-live/twiml POST TwiML for outbound calls
/gemini-live/call-status POST Twilio call status webhook

Outbound Call API

curl -X POST https://your-domain/gemini-live/call \
  -H 'Content-Type: application/json' \
  -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'

Configuration

All settings via CLI args or environment variables:

Core

  • --model — Gemini model (default: gemini-2.5-flash-native-audio-latest)
  • --voice — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)
  • --from-number — Twilio outbound number (default: env TWILIO_FROM)
  • --system-prompt — AI persona system prompt
  • --max-duration — Max call seconds (default: 300)

VAD (Voice Activity Detection)

  • --vad-enabled / --no-vad — Toggle server-side VAD (default: on)
  • --vad-silence-ms — Silence duration to trigger activityEnd (default: 500)
  • --vad-energy-threshold — RMS energy threshold (default: 0.01)
  • --vad-speech-min-ms — Min speech duration before activityStart (default: 100)

Echo Suppression

  • --echo-multiplier — VAD threshold multiplier during agent speech (default: 3.0)
  • --echo-decay-ms — Decay time after agent stops speaking (default: 300)

Twilio Setup

  1. Buy a phone number on Twilio
  2. Set Voice webhook: https://your-domain/gemini-live/incoming (HTTP POST)
  3. Set Call status URL: https://your-domain/gemini-live/call-status (HTTP POST)
  4. Ensure geo-permissions are enabled for target countries

Network Requirements

The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.

# Caddy config example
handle /gemini-live/* {
    reverse_proxy localhost:3335 {
        flush_interval -1
        transport http {
            read_timeout 0
            write_timeout 0
        }
    }
}

Performance

Latency benchmarks (Gemini 2.5 Flash Native Audio):

Config Median Min Max
No VAD, 200ms buffer 3,660ms 2,360ms 5,180ms
Server VAD, 50ms buffer 2,500ms 2,080ms 6,980ms

Server-side VAD reduces median latency by ~32%.

安全使用建议
This skill appears to implement a Twilio→Gemini real‑time bridge, but there are red flags you should address before running it with your production credentials: - Hardcoded defaults: the script contains a default TWILIO_ACCOUNT_SID, default TWILIO_FROM phone number, and a default PUBLIC_URL (https://athena.abfs.tech). If you don't explicitly set TWILIO_ACCOUNT_SID, TWILIO_FROM, and PUBLIC_URL, the bridge may behave unexpectedly (attempt to use those defaults or embed that external domain in generated TwiML). Replace those defaults with values you control. - Environment variables: SKILL.md asks for GOOGLE_API_KEY and TWILIO_AUTH_TOKEN; the code will also accept GEMINI_API_KEY and other TWILIO_* env vars. Ensure you supply the correct key name (GEMINI_API_KEY is supported) and do not accidentally expose your credentials to any third party. - Logging and local files: the bridge writes structured logs to /tmp/openclaw including hostname and runtime metadata. Review the logger behavior and the log retention/location if you care about privacy of metadata or call events. - Network exposure: this server must be publicly reachable for Twilio webhooks; run it in a hardened environment (firewalled host, TLS via reverse proxy) and verify the WebSocket/Twiml endpoints before connecting real phone numbers. - Review full code paths: confirm how TwiML responses are constructed (ensure they do not redirect audio/media to the default PUBLIC_URL or any external endpoint you don't control), and verify that no audio or call data is forwarded to third parties without your consent. What would change the assessment: seeing the remaining code that generates TwiML and outbound requests (to confirm whether the default PUBLIC_URL is used to route media or callbacks), removal of hardcoded account values, or an explicit comment from the author that those defaults are only placeholders and will never be used in runtime. If you cannot validate those points, treat this skill as suspicious and run it in an isolated/test environment first.
功能分析
Type: OpenClaw Skill Name: gemini-live-phone Version: 1.0.1 The skill bridges Twilio calls to the Gemini Live API but contains hardcoded configuration defaults in `scripts/bridge.py` that point to external infrastructure. Specifically, it includes a hardcoded Twilio Account SID and a default `public_url` pointing to `athena.abfs.tech`. If a user fails to override these defaults via environment variables, outbound calls initiated through the bridge will attempt to fetch TwiML instructions and send status callbacks to this external domain, potentially leading to call hijacking or metadata exfiltration. While these may be remnants of development, they represent a significant security risk.
能力评估
Purpose & Capability
Name, description, requirements, dependencies (google-genai, twilio) and binaries (python3, uvicorn) align with a Twilio→Gemini real‑time bridge. However the code embeds unexpected defaults (a hardcoded TWILIO_ACCOUNT_SID, default TWILIO_FROM number, and a default PUBLIC_URL pointing at https://athena.abfs.tech) that are not justified by the SKILL.md and are unusual for a user‑run bridge.
Instruction Scope
SKILL.md instructs only to set GOOGLE_API_KEY and TWILIO_AUTH_TOKEN and run the bridge. The code reads (and uses) other environment variables if present (TWILIO_ACCOUNT_SID, TWILIO_FROM, PUBLIC_URL, GEMINI_API_KEY), writes structured logs to /tmp/openclaw with hostname and runtime metadata, and likely emits TwiML and WebSocket behavior. The default PUBLIC_URL suggests the service may reference an external domain by default, which could redirect Twilio traffic or metadata externally if not overridden.
Install Mechanism
No install spec; this is instruction + code. Dependencies are standard Python packages listed in requirements.txt (fastapi, uvicorn, google-genai, twilio, etc.). No remote downloads or arbitrary archives in the manifest.
Credentials
Declared required envs (GOOGLE_API_KEY, TWILIO_AUTH_TOKEN) map to Gemini and Twilio usage, which is expected. But the code prefers GEMINI_API_KEY or GOOGLE_API_KEY (both supported) and also will use TWILIO_ACCOUNT_SID and TWILIO_FROM from env or fall back to hardcoded defaults. The presence of hardcoded account SID and phone number is unexpected and disproportionate — it could cause accidental cross‑account interactions or leak routing if defaults are used. The logger also records hostname/runtime info to local files, which could contain sensitive metadata.
Persistence & Privilege
Skill is not always-enabled and is user-invocable. It writes local log files under /tmp/openclaw but does not request system-wide persistent privileges or modify other skills. No evidence of modifying system or other skill configs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gemini-live-phone
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gemini-live-phone 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Added version field to SKILL metadata (now version 1.0.1) - Updated description to be more concise and mention key features directly - No functionality or endpoint changes; documentation only update - No breaking changes for users or API consumers
v1.1.0
Add OpenClaw native structured logging for VAD events, call lifecycle, and greeting detection
v1.0.0
Initial release: Twilio to Gemini Live API bridge with server-side VAD and echo suppression
元数据
Slug gemini-live-phone
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

Gemini Live Phone 是什么?

Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 294 次。

如何安装 Gemini Live Phone?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gemini-live-phone」即可一键安装,无需额外配置。

Gemini Live Phone 是免费的吗?

是的,Gemini Live Phone 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Gemini Live Phone 支持哪些平台?

Gemini Live Phone 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Gemini Live Phone?

由 ABFS Tech(@quantdeveloperusa)开发并维护,当前版本 v1.0.1。

💬 留言讨论