← 返回 Skills 市场

Gemini Live Phone

Name: Gemini Live Phone
Author: quantdeveloperusa

作者 ABFS Tech · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

294

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gemini-live-phone

功能描述

Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.

使用说明 (SKILL.md)

Gemini Live Phone Bridge

Real-time voice AI over phone calls using Google Gemini's native audio capabilities.

Architecture

Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)

Quick Start

# Set required env vars
export GOOGLE_API_KEY="your-key"
export TWILIO_AUTH_TOKEN="your-token"

# Run the bridge
python scripts/bridge.py --port 3335

Endpoints

Endpoint	Method	Description
`/gemini-live/status`	GET	Health check + active calls
`/gemini-live/incoming`	POST	TwiML for inbound calls (Twilio webhook)
`/gemini-live/stream`	WS	Twilio Media Stream WebSocket
`/gemini-live/call`	POST	Initiate outbound call
`/gemini-live/twiml`	POST	TwiML for outbound calls
`/gemini-live/call-status`	POST	Twilio call status webhook

Outbound Call API

curl -X POST https://your-domain/gemini-live/call \
  -H 'Content-Type: application/json' \
  -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'

Configuration

All settings via CLI args or environment variables:

Core

--model — Gemini model (default: gemini-2.5-flash-native-audio-latest)
--voice — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)
--from-number — Twilio outbound number (default: env TWILIO_FROM)
--system-prompt — AI persona system prompt
--max-duration — Max call seconds (default: 300)

VAD (Voice Activity Detection)

--vad-enabled / --no-vad — Toggle server-side VAD (default: on)
--vad-silence-ms — Silence duration to trigger activityEnd (default: 500)
--vad-energy-threshold — RMS energy threshold (default: 0.01)
--vad-speech-min-ms — Min speech duration before activityStart (default: 100)

Echo Suppression

--echo-multiplier — VAD threshold multiplier during agent speech (default: 3.0)
--echo-decay-ms — Decay time after agent stops speaking (default: 300)

Twilio Setup

Buy a phone number on Twilio
Set Voice webhook: https://your-domain/gemini-live/incoming (HTTP POST)
Set Call status URL: https://your-domain/gemini-live/call-status (HTTP POST)
Ensure geo-permissions are enabled for target countries

Network Requirements

The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.

# Caddy config example
handle /gemini-live/* {
    reverse_proxy localhost:3335 {
        flush_interval -1
        transport http {
            read_timeout 0
            write_timeout 0
        }
    }
}

Performance

Latency benchmarks (Gemini 2.5 Flash Native Audio):

Config	Median	Min	Max
No VAD, 200ms buffer	3,660ms	2,360ms	5,180ms
Server VAD, 50ms buffer	2,500ms	2,080ms	6,980ms

Server-side VAD reduces median latency by ~32%.

安全使用建议

This skill appears to implement a Twilio→Gemini real‑time bridge, but there are red flags you should address before running it with your production credentials: - Hardcoded defaults: the script contains a default TWILIO_ACCOUNT_SID, default TWILIO_FROM phone number, and a default PUBLIC_URL (https://athena.abfs.tech). If you don't explicitly set TWILIO_ACCOUNT_SID, TWILIO_FROM, and PUBLIC_URL, the bridge may behave unexpectedly (attempt to use those defaults or embed that external domain in generated TwiML). Replace those defaults with values you control. - Environment variables: SKILL.md asks for GOOGLE_API_KEY and TWILIO_AUTH_TOKEN; the code will also accept GEMINI_API_KEY and other TWILIO_* env vars. Ensure you supply the correct key name (GEMINI_API_KEY is supported) and do not accidentally expose your credentials to any third party. - Logging and local files: the bridge writes structured logs to /tmp/openclaw including hostname and runtime metadata. Review the logger behavior and the log retention/location if you care about privacy of metadata or call events. - Network exposure: this server must be publicly reachable for Twilio webhooks; run it in a hardened environment (firewalled host, TLS via reverse proxy) and verify the WebSocket/Twiml endpoints before connecting real phone numbers. - Review full code paths: confirm how TwiML responses are constructed (ensure they do not redirect audio/media to the default PUBLIC_URL or any external endpoint you don't control), and verify that no audio or call data is forwarded to third parties without your consent. What would change the assessment: seeing the remaining code that generates TwiML and outbound requests (to confirm whether the default PUBLIC_URL is used to route media or callbacks), removal of hardcoded account values, or an explicit comment from the author that those defaults are only placeholders and will never be used in runtime. If you cannot validate those points, treat this skill as suspicious and run it in an isolated/test environment first.

功能分析

Type: OpenClaw Skill Name: gemini-live-phone Version: 1.0.1 The skill bridges Twilio calls to the Gemini Live API but contains hardcoded configuration defaults in `scripts/bridge.py` that point to external infrastructure. Specifically, it includes a hardcoded Twilio Account SID and a default `public_url` pointing to `athena.abfs.tech`. If a user fails to override these defaults via environment variables, outbound calls initiated through the bridge will attempt to fetch TwiML instructions and send status callbacks to this external domain, potentially leading to call hijacking or metadata exfiltration. While these may be remnants of development, they represent a significant security risk.

能力评估

ℹ Purpose & Capability

Name, description, requirements, dependencies (google-genai, twilio) and binaries (python3, uvicorn) align with a Twilio→Gemini real‑time bridge. However the code embeds unexpected defaults (a hardcoded TWILIO_ACCOUNT_SID, default TWILIO_FROM number, and a default PUBLIC_URL pointing at https://athena.abfs.tech) that are not justified by the SKILL.md and are unusual for a user‑run bridge.

⚠ Instruction Scope

SKILL.md instructs only to set GOOGLE_API_KEY and TWILIO_AUTH_TOKEN and run the bridge. The code reads (and uses) other environment variables if present (TWILIO_ACCOUNT_SID, TWILIO_FROM, PUBLIC_URL, GEMINI_API_KEY), writes structured logs to /tmp/openclaw with hostname and runtime metadata, and likely emits TwiML and WebSocket behavior. The default PUBLIC_URL suggests the service may reference an external domain by default, which could redirect Twilio traffic or metadata externally if not overridden.

✓ Install Mechanism

No install spec; this is instruction + code. Dependencies are standard Python packages listed in requirements.txt (fastapi, uvicorn, google-genai, twilio, etc.). No remote downloads or arbitrary archives in the manifest.

⚠ Credentials

Declared required envs (GOOGLE_API_KEY, TWILIO_AUTH_TOKEN) map to Gemini and Twilio usage, which is expected. But the code prefers GEMINI_API_KEY or GOOGLE_API_KEY (both supported) and also will use TWILIO_ACCOUNT_SID and TWILIO_FROM from env or fall back to hardcoded defaults. The presence of hardcoded account SID and phone number is unexpected and disproportionate — it could cause accidental cross‑account interactions or leak routing if defaults are used. The logger also records hostname/runtime info to local files, which could contain sensitive metadata.

✓ Persistence & Privilege

Skill is not always-enabled and is user-invocable. It writes local log files under /tmp/openclaw but does not request system-wide persistent privileges or modify other skills. No evidence of modifying system or other skill configs.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gemini-live-phone
安装完成后，直接呼叫该 Skill 的名称或使用 /gemini-live-phone 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Added version field to SKILL metadata (now version 1.0.1) - Updated description to be more concise and mention key features directly - No functionality or endpoint changes; documentation only update - No breaking changes for users or API consumers

v1.1.0

Add OpenClaw native structured logging for VAD events, call lifecycle, and greeting detection

v1.0.0

Initial release: Twilio to Gemini Live API bridge with server-side VAD and echo suppression

元数据

Slug gemini-live-phone

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 3

常见问题