← Back to Skills Marketplace
alimostafaradwan

Gemini Voice Assistant

by Ali Mostafa Radwan · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
688
Downloads
1
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install gemini-voice-assistant
Description
Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI...
README (SKILL.md)

Gemini Voice Assistant

A voice-to-voice AI assistant powered by Google's Gemini Live API. Speak to the AI and it responds with natural-sounding voice.

Usage

Text Mode

cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py "Your question or message"

Voice Mode

cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py --audio /path/to/audio.ogg "optional context"

Response Format

The handler returns a JSON response:

{
  "message": "[[audio_as_voice]]\
MEDIA:/tmp/gemini_voice_xxx.ogg",
  "text": "Text response from Gemini"
}

Configuration

Set your Gemini API key:

export GEMINI_API_KEY="your-api-key-here"

Or create a .env file in the skill directory:

GEMINI_API_KEY=your-api-key-here

Model Options

The default model is gemini-2.5-flash-native-audio-preview-12-2025 for audio support.

To use a different model, edit handler.py:

MODEL = "gemini-2.0-flash-exp"  # For text-only

Requirements

  • google-genai>=1.0.0
  • numpy>=1.24.0
  • soundfile>=0.12.0
  • librosa>=0.10.0 (for audio input)
  • FFmpeg (for audio conversion)

Features

  • 🎙️ Voice input/output support
  • 💬 Text conversations
  • 🔧 Configurable system instructions
  • ⚡ Fast responses with Gemini Flash
Usage Guidance
What to consider before installing: - Metadata mismatch: the registry metadata claims no required env vars, but skill.json and handler.py require GEMINI_API_KEY. Verify the source and ask the publisher to correct metadata before trusting the package. - Secrets: the skill will read a .env file in its directory and import values into the process environment if present. Do not put unrelated secrets in that .env file; only store the Gemini API key there if you accept the risk. - Network and privacy: the skill uses google-genai to connect to Google's Gemini service — any voice/text you send will go to Google's servers. If you have privacy concerns, do not use it with sensitive data. - Local files: the skill writes audio to /tmp/gemini_voice_<id>.ogg and removes an intermediate WAV file. OGG files may persist until cleared; consider automatic cleanup or a different output directory if multiple users share the system. - Dependencies and binaries: you must install the listed Python packages and ensure FFmpeg is available at the expected path (handler.py uses /usr/bin/ffmpeg). Confirm the google-genai package you install is the official one and review its network behavior. - Source trust: the skill has no homepage and an unknown source/owner. If you need strong assurance, request a verified source, a repository link, or an upstream release to inspect before running it with your API key. If those concerns are acceptable and you trust the publisher, the code itself is consistent with its stated functionality; otherwise treat this as untrusted until the metadata and provenance issues are resolved.
Capability Analysis
Type: OpenClaw Skill Name: gemini-voice-assistant Version: 1.0.0 The skill is classified as suspicious due to a significant prompt injection vulnerability in `scripts/handler.py`. The `system_instruction` parameter, which can be provided via `request_data` or CLI arguments, is directly passed to the Gemini API without sanitization or validation. This allows an attacker or a compromised OpenClaw agent to inject arbitrary instructions into the LLM's system prompt, potentially manipulating the AI's behavior, extracting sensitive information, or causing unintended actions. While the skill itself does not contain malicious instructions, this design flaw creates a critical attack surface.
Capability Assessment
Purpose & Capability
The handler.py implements a Gemini Live audio/text client, depends on google-genai and audio libraries, and uses ffmpeg for conversion — which is coherent with a 'Gemini Voice Assistant'. However the registry metadata provided to the evaluator claimed 'Required env vars: none' while skill.json and the code require GEMINI_API_KEY. That metadata mismatch is an inconsistency you should resolve before trusting the package source.
Instruction Scope
SKILL.md instructions map directly to the CLI entrypoint in handler.py. The runtime reads a .env file in the skill directory (documented) and uses GEMINI_API_KEY from the environment; it writes temporary audio to /tmp and invokes ffmpeg. The instructions do not attempt to read unrelated system files or send data to endpoints other than the Gemini API.
Install Mechanism
There is no automated install spec (instruction-only behavior plus a Python script). Dependencies are standard Python packages and FFmpeg is expected to be present on the host. No external archive downloads or custom installers are present in the skill bundle.
Credentials
Requiring a single GEMINI_API_KEY is proportionate to contacting Gemini. The code will also load any key-value pairs from a local .env file into the process environment (only if present), so any secrets stored there may be read by the skill — ensure that .env contains only the intended API key. The earlier registry claim of 'no env vars' contradicts the code and skill.json, which is concerning.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or global config. It does create audio files under /tmp and leaves OGG output there; this is local persistence but not an elevated platform privilege.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install gemini-voice-assistant
  3. After installation, invoke the skill by name or use /gemini-voice-assistant
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release - Voice-to-voice AI assistant using Gemini Live API
Metadata
Slug gemini-voice-assistant
Version 1.0.0
License
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Gemini Voice Assistant?

Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI... It is an AI Agent Skill for Claude Code / OpenClaw, with 688 downloads so far.

How do I install Gemini Voice Assistant?

Run "/install gemini-voice-assistant" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gemini Voice Assistant free?

Yes, Gemini Voice Assistant is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Gemini Voice Assistant support?

Gemini Voice Assistant is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gemini Voice Assistant?

It is built and maintained by Ali Mostafa Radwan (@alimostafaradwan); the current version is v1.0.0.

💬 Comments