← 返回 Skills 市场

Gemini Voice Assistant

Name: Gemini Voice Assistant
Author: alimostafaradwan

作者 Ali Mostafa Radwan · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

688

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gemini-voice-assistant

功能描述

Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI...

使用说明 (SKILL.md)

Gemini Voice Assistant

A voice-to-voice AI assistant powered by Google's Gemini Live API. Speak to the AI and it responds with natural-sounding voice.

Usage

Text Mode

cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py "Your question or message"

Voice Mode

cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py --audio /path/to/audio.ogg "optional context"

Response Format

The handler returns a JSON response:

{
  "message": "[[audio_as_voice]]\
MEDIA:/tmp/gemini_voice_xxx.ogg",
  "text": "Text response from Gemini"
}

Configuration

Set your Gemini API key:

export GEMINI_API_KEY="your-api-key-here"

Or create a .env file in the skill directory:

GEMINI_API_KEY=your-api-key-here

Model Options

The default model is gemini-2.5-flash-native-audio-preview-12-2025 for audio support.

To use a different model, edit handler.py:

MODEL = "gemini-2.0-flash-exp"  # For text-only

Requirements

google-genai>=1.0.0
numpy>=1.24.0
soundfile>=0.12.0
librosa>=0.10.0 (for audio input)
FFmpeg (for audio conversion)

Features

🎙️ Voice input/output support
💬 Text conversations
🔧 Configurable system instructions
⚡ Fast responses with Gemini Flash

安全使用建议

What to consider before installing: - Metadata mismatch: the registry metadata claims no required env vars, but skill.json and handler.py require GEMINI_API_KEY. Verify the source and ask the publisher to correct metadata before trusting the package. - Secrets: the skill will read a .env file in its directory and import values into the process environment if present. Do not put unrelated secrets in that .env file; only store the Gemini API key there if you accept the risk. - Network and privacy: the skill uses google-genai to connect to Google's Gemini service — any voice/text you send will go to Google's servers. If you have privacy concerns, do not use it with sensitive data. - Local files: the skill writes audio to /tmp/gemini_voice_<id>.ogg and removes an intermediate WAV file. OGG files may persist until cleared; consider automatic cleanup or a different output directory if multiple users share the system. - Dependencies and binaries: you must install the listed Python packages and ensure FFmpeg is available at the expected path (handler.py uses /usr/bin/ffmpeg). Confirm the google-genai package you install is the official one and review its network behavior. - Source trust: the skill has no homepage and an unknown source/owner. If you need strong assurance, request a verified source, a repository link, or an upstream release to inspect before running it with your API key. If those concerns are acceptable and you trust the publisher, the code itself is consistent with its stated functionality; otherwise treat this as untrusted until the metadata and provenance issues are resolved.

功能分析

Type: OpenClaw Skill Name: gemini-voice-assistant Version: 1.0.0 The skill is classified as suspicious due to a significant prompt injection vulnerability in `scripts/handler.py`. The `system_instruction` parameter, which can be provided via `request_data` or CLI arguments, is directly passed to the Gemini API without sanitization or validation. This allows an attacker or a compromised OpenClaw agent to inject arbitrary instructions into the LLM's system prompt, potentially manipulating the AI's behavior, extracting sensitive information, or causing unintended actions. While the skill itself does not contain malicious instructions, this design flaw creates a critical attack surface.

能力评估

ℹ Purpose & Capability

The handler.py implements a Gemini Live audio/text client, depends on google-genai and audio libraries, and uses ffmpeg for conversion — which is coherent with a 'Gemini Voice Assistant'. However the registry metadata provided to the evaluator claimed 'Required env vars: none' while skill.json and the code require GEMINI_API_KEY. That metadata mismatch is an inconsistency you should resolve before trusting the package source.

✓ Instruction Scope

SKILL.md instructions map directly to the CLI entrypoint in handler.py. The runtime reads a .env file in the skill directory (documented) and uses GEMINI_API_KEY from the environment; it writes temporary audio to /tmp and invokes ffmpeg. The instructions do not attempt to read unrelated system files or send data to endpoints other than the Gemini API.

✓ Install Mechanism

There is no automated install spec (instruction-only behavior plus a Python script). Dependencies are standard Python packages and FFmpeg is expected to be present on the host. No external archive downloads or custom installers are present in the skill bundle.

ℹ Credentials

Requiring a single GEMINI_API_KEY is proportionate to contacting Gemini. The code will also load any key-value pairs from a local .env file into the process environment (only if present), so any secrets stored there may be read by the skill — ensure that .env contains only the intended API key. The earlier registry claim of 'no env vars' contradicts the code and skill.json, which is concerning.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or global config. It does create audio files under /tmp and leaves OGG output there; this is local persistence but not an elevated platform privilege.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gemini-voice-assistant
安装完成后，直接呼叫该 Skill 的名称或使用 /gemini-voice-assistant 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release - Voice-to-voice AI assistant using Gemini Live API

元数据

Slug gemini-voice-assistant

版本 1.0.0

许可证 —

累计安装 1

当前安装数 1

历史版本数 1

常见问题