← Back to Skills Marketplace

Gemini Live Phone

Name: Gemini Live Phone
Author: quantdeveloperusa

by ABFS Tech · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

294

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install gemini-live-phone

Description

Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.

README (SKILL.md)

Gemini Live Phone Bridge

Real-time voice AI over phone calls using Google Gemini's native audio capabilities.

Architecture

Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)

Quick Start

# Set required env vars
export GOOGLE_API_KEY="your-key"
export TWILIO_AUTH_TOKEN="your-token"

# Run the bridge
python scripts/bridge.py --port 3335

Endpoints

Endpoint	Method	Description
`/gemini-live/status`	GET	Health check + active calls
`/gemini-live/incoming`	POST	TwiML for inbound calls (Twilio webhook)
`/gemini-live/stream`	WS	Twilio Media Stream WebSocket
`/gemini-live/call`	POST	Initiate outbound call
`/gemini-live/twiml`	POST	TwiML for outbound calls
`/gemini-live/call-status`	POST	Twilio call status webhook

Outbound Call API

curl -X POST https://your-domain/gemini-live/call \
  -H 'Content-Type: application/json' \
  -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'

Configuration

All settings via CLI args or environment variables:

Core

--model — Gemini model (default: gemini-2.5-flash-native-audio-latest)
--voice — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)
--from-number — Twilio outbound number (default: env TWILIO_FROM)
--system-prompt — AI persona system prompt
--max-duration — Max call seconds (default: 300)

VAD (Voice Activity Detection)

--vad-enabled / --no-vad — Toggle server-side VAD (default: on)
--vad-silence-ms — Silence duration to trigger activityEnd (default: 500)
--vad-energy-threshold — RMS energy threshold (default: 0.01)
--vad-speech-min-ms — Min speech duration before activityStart (default: 100)

Echo Suppression

--echo-multiplier — VAD threshold multiplier during agent speech (default: 3.0)
--echo-decay-ms — Decay time after agent stops speaking (default: 300)

Twilio Setup

Buy a phone number on Twilio
Set Voice webhook: https://your-domain/gemini-live/incoming (HTTP POST)
Set Call status URL: https://your-domain/gemini-live/call-status (HTTP POST)
Ensure geo-permissions are enabled for target countries

Network Requirements

The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.

# Caddy config example
handle /gemini-live/* {
    reverse_proxy localhost:3335 {
        flush_interval -1
        transport http {
            read_timeout 0
            write_timeout 0
        }
    }
}

Performance

Latency benchmarks (Gemini 2.5 Flash Native Audio):

Config	Median	Min	Max
No VAD, 200ms buffer	3,660ms	2,360ms	5,180ms
Server VAD, 50ms buffer	2,500ms	2,080ms	6,980ms

Server-side VAD reduces median latency by ~32%.

Usage Guidance

This skill appears to implement a Twilio→Gemini real‑time bridge, but there are red flags you should address before running it with your production credentials: - Hardcoded defaults: the script contains a default TWILIO_ACCOUNT_SID, default TWILIO_FROM phone number, and a default PUBLIC_URL (https://athena.abfs.tech). If you don't explicitly set TWILIO_ACCOUNT_SID, TWILIO_FROM, and PUBLIC_URL, the bridge may behave unexpectedly (attempt to use those defaults or embed that external domain in generated TwiML). Replace those defaults with values you control. - Environment variables: SKILL.md asks for GOOGLE_API_KEY and TWILIO_AUTH_TOKEN; the code will also accept GEMINI_API_KEY and other TWILIO_* env vars. Ensure you supply the correct key name (GEMINI_API_KEY is supported) and do not accidentally expose your credentials to any third party. - Logging and local files: the bridge writes structured logs to /tmp/openclaw including hostname and runtime metadata. Review the logger behavior and the log retention/location if you care about privacy of metadata or call events. - Network exposure: this server must be publicly reachable for Twilio webhooks; run it in a hardened environment (firewalled host, TLS via reverse proxy) and verify the WebSocket/Twiml endpoints before connecting real phone numbers. - Review full code paths: confirm how TwiML responses are constructed (ensure they do not redirect audio/media to the default PUBLIC_URL or any external endpoint you don't control), and verify that no audio or call data is forwarded to third parties without your consent. What would change the assessment: seeing the remaining code that generates TwiML and outbound requests (to confirm whether the default PUBLIC_URL is used to route media or callbacks), removal of hardcoded account values, or an explicit comment from the author that those defaults are only placeholders and will never be used in runtime. If you cannot validate those points, treat this skill as suspicious and run it in an isolated/test environment first.

Capability Analysis

Type: OpenClaw Skill Name: gemini-live-phone Version: 1.0.1 The skill bridges Twilio calls to the Gemini Live API but contains hardcoded configuration defaults in `scripts/bridge.py` that point to external infrastructure. Specifically, it includes a hardcoded Twilio Account SID and a default `public_url` pointing to `athena.abfs.tech`. If a user fails to override these defaults via environment variables, outbound calls initiated through the bridge will attempt to fetch TwiML instructions and send status callbacks to this external domain, potentially leading to call hijacking or metadata exfiltration. While these may be remnants of development, they represent a significant security risk.

Capability Assessment

ℹ Purpose & Capability

Name, description, requirements, dependencies (google-genai, twilio) and binaries (python3, uvicorn) align with a Twilio→Gemini real‑time bridge. However the code embeds unexpected defaults (a hardcoded TWILIO_ACCOUNT_SID, default TWILIO_FROM number, and a default PUBLIC_URL pointing at https://athena.abfs.tech) that are not justified by the SKILL.md and are unusual for a user‑run bridge.

⚠ Instruction Scope

SKILL.md instructs only to set GOOGLE_API_KEY and TWILIO_AUTH_TOKEN and run the bridge. The code reads (and uses) other environment variables if present (TWILIO_ACCOUNT_SID, TWILIO_FROM, PUBLIC_URL, GEMINI_API_KEY), writes structured logs to /tmp/openclaw with hostname and runtime metadata, and likely emits TwiML and WebSocket behavior. The default PUBLIC_URL suggests the service may reference an external domain by default, which could redirect Twilio traffic or metadata externally if not overridden.

✓ Install Mechanism

No install spec; this is instruction + code. Dependencies are standard Python packages listed in requirements.txt (fastapi, uvicorn, google-genai, twilio, etc.). No remote downloads or arbitrary archives in the manifest.

⚠ Credentials

Declared required envs (GOOGLE_API_KEY, TWILIO_AUTH_TOKEN) map to Gemini and Twilio usage, which is expected. But the code prefers GEMINI_API_KEY or GOOGLE_API_KEY (both supported) and also will use TWILIO_ACCOUNT_SID and TWILIO_FROM from env or fall back to hardcoded defaults. The presence of hardcoded account SID and phone number is unexpected and disproportionate — it could cause accidental cross‑account interactions or leak routing if defaults are used. The logger also records hostname/runtime info to local files, which could contain sensitive metadata.

✓ Persistence & Privilege

Skill is not always-enabled and is user-invocable. It writes local log files under /tmp/openclaw but does not request system-wide persistent privileges or modify other skills. No evidence of modifying system or other skill configs.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install gemini-live-phone
After installation, invoke the skill by name or use /gemini-live-phone
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

- Added version field to SKILL metadata (now version 1.0.1) - Updated description to be more concise and mention key features directly - No functionality or endpoint changes; documentation only update - No breaking changes for users or API consumers

v1.1.0

Add OpenClaw native structured logging for VAD events, call lifecycle, and greeting detection

v1.0.0

Initial release: Twilio to Gemini Live API bridge with server-side VAD and echo suppression

Metadata

Slug gemini-live-phone

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Gemini Live Phone?

Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression. It is an AI Agent Skill for Claude Code / OpenClaw, with 294 downloads so far.

How do I install Gemini Live Phone?

Run "/install gemini-live-phone" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gemini Live Phone free?

Yes, Gemini Live Phone is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gemini Live Phone support?

Gemini Live Phone is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gemini Live Phone?

It is built and maintained by ABFS Tech (@quantdeveloperusa); the current version is v1.0.1.

More Skills