← 返回 Skills 市场

Gemini STT

Name: Gemini STT
Author: araa47

作者 araa47 · GitHub ↗ · v1.1.0

linuxdarwin ✓ 安全检测通过

3114

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gemini-stt

功能描述

Transcribe audio files using Google's Gemini API or Vertex AI

使用说明 (SKILL.md)

Gemini Speech-to-Text Skill

Transcribe audio files using Google's Gemini API or Vertex AI. Default model is gemini-2.0-flash-lite for fastest transcription.

Authentication (choose one)

Option 1: Vertex AI with Application Default Credentials (Recommended)

gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID

The script will automatically detect and use ADC when available.

Option 2: Direct Gemini API Key

Set GEMINI_API_KEY in environment (e.g., ~/.env or ~/.clawdbot/.env)

Requirements

Python 3.10+ (no external dependencies)
Either GEMINI_API_KEY or gcloud CLI with ADC configured

Supported Formats

.ogg / .opus (Telegram voice messages)
.mp3
.wav
.m4a

Usage

# Auto-detect auth (tries ADC first, then GEMINI_API_KEY)
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg

# Force Vertex AI
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex

# With a specific model
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro

# Vertex AI with specific project and region
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1

# With Clawdbot media
python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg

Options

Option	Description
`\x3Caudio_file>`	Path to the audio file (required)
`--model`, `-m`	Gemini model to use (default: `gemini-2.0-flash-lite`)
`--vertex`, `-v`	Force use of Vertex AI with ADC
`--project`, `-p`	GCP project ID (for Vertex, defaults to gcloud config)
`--region`, `-r`	GCP region (for Vertex, default: `us-central1`)

Supported Models

Any Gemini model that supports audio input can be used. Recommended models:

Model	Notes
`gemini-2.0-flash-lite`	Default. Fastest transcription speed.
`gemini-2.0-flash`	Fast and cost-effective.
`gemini-2.5-flash-lite`	Lightweight 2.5 model.
`gemini-2.5-flash`	Balanced speed and quality.
`gemini-2.5-pro`	Higher quality, slower.
`gemini-3-flash-preview`	Latest flash model.
`gemini-3-pro-preview`	Latest pro model, best quality.

See Gemini API Models for the latest list.

How It Works

Reads the audio file and base64 encodes it
Auto-detects authentication:
- If ADC is available (gcloud), uses Vertex AI endpoint
- Otherwise, uses GEMINI_API_KEY with direct Gemini API
Sends to the selected Gemini model with transcription prompt
Returns the transcribed text

Example Integration

For Clawdbot voice message handling:

# Transcribe incoming voice message
TRANSCRIPT=$(python ~/.claude/skills/gemini-stt/transcribe.py "$AUDIO_PATH")
echo "User said: $TRANSCRIPT"

Error Handling

The script exits with code 1 and prints to stderr on:

No authentication available (neither ADC nor GEMINI_API_KEY)
File not found
API errors
Missing GCP project (when using Vertex)

Notes

Uses Gemini 2.0 Flash Lite by default for fastest transcription
No external Python dependencies (uses stdlib only)
Automatically detects MIME type from file extension
Prefers Vertex AI with ADC when available (no API key management needed)

安全使用建议

This skill is coherent with its stated purpose, but before installing: (1) be aware it requires authentication—either set GEMINI_API_KEY or run 'gcloud auth application-default login' and ensure a proper GCP project is configured; the registry metadata currently omits these requirements. (2) Using ADC (gcloud) will cause the script to call 'gcloud auth print-access-token' and use your ADC permissions to call Vertex; prefer a least-privilege service account or isolated environment if you are concerned about exposing broader GCP credentials. (3) GEMINI_API_KEY should be stored securely (not in world-readable files). (4) Review and run the script in a safe environment if you want to inspect network calls; endpoints contacted are standard Google APIs (generativelanguage.googleapis.com and *.aiplatform.googleapis.com). If you need the metadata fixed or want the skill to declare GEMINI_API_KEY / GOOGLE_CLOUD_PROJECT as required, request that from the publisher before trusting it in production.

功能分析

Type: OpenClaw Skill Name: gemini-stt Version: 1.1.0 The skill is designed to transcribe audio files using Google's Gemini API or Vertex AI. The `transcribe.py` script legitimately uses `subprocess` to interact with the `gcloud` CLI for authentication (retrieving access tokens and project IDs) and sends base64-encoded audio data to official Google API endpoints. There is no evidence of data exfiltration to unauthorized parties, malicious execution, persistence mechanisms, or prompt injection attempts against the OpenClaw agent in `SKILL.md`. All actions are aligned with the stated purpose of speech-to-text transcription.

能力评估

ℹ Purpose & Capability

Skill name/description (Gemini/Vertex STT) match the code and runtime instructions. The only mismatch is registry metadata claiming 'no required env vars' while SKILL.md and the script require either GEMINI_API_KEY or Google ADC (gcloud). This is an inconsistency in metadata, not in functionality.

✓ Instruction Scope

Runtime instructions and the script are scoped to reading an audio file, base64-encoding it, and calling Google Gemini or Vertex endpoints. It invokes 'gcloud' only to obtain an access token/project configuration. It does not read unrelated system files or send data to unexpected endpoints.

✓ Install Mechanism

No install spec; the skill is instruction-only with a single Python script that uses only the standard library. Low risk from installation artifacts.

ℹ Credentials

Authentication requirements (GEMINI_API_KEY or gcloud ADC and possibly GOOGLE_CLOUD_PROJECT/CLOUDSDK_CORE_PROJECT) are appropriate for contacting Gemini/Vertex. However, the skill metadata declares no required environment variables or primary credential, which is inaccurate and could mislead users about needed credentials.

✓ Persistence & Privilege

The skill does not request permanent inclusion (always:false), does not modify other skills or system settings, and does not persist credentials. It runs commands locally (gcloud) but does not escalate privileges or change system-wide configuration.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gemini-stt
安装完成后，直接呼叫该 Skill 的名称或使用 /gemini-stt 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

Added support for Google Vertex AI with Application Default Credentials (ADC). Now supports both GEMINI_API_KEY and gcloud ADC authentication methods. Auto-detects authentication method.

v1.0.0

Initial release of Gemini-based Speech-to-Text skill. Optimized for speed with gemini-2.0-flash-lite default.

元数据

Slug gemini-stt

版本 1.1.0

许可证 —

累计安装 11

当前安装数 11

历史版本数 2

常见问题

Gemini STT 是什么？

Transcribe audio files using Google's Gemini API or Vertex AI. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 3114 次。

如何安装 Gemini STT？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gemini-stt」即可一键安装，无需额外配置。

Gemini STT 是免费的吗？

是的，Gemini STT 完全免费（开源免费），可自由下载、安装和使用。

Gemini STT 支持哪些平台？

Gemini STT 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（linux, darwin）。

谁开发了 Gemini STT？

由 araa47（@araa47）开发并维护，当前版本 v1.1.0。