Voice Assistant

Name: Voice Assistant
Author: charantejmandali18

功能描述

Real-time voice assistant for OpenClaw. Streams mic audio through configurable STT (Deepgram or ElevenLabs) into your OpenClaw agent, then speaks the response via configurable TTS (Deepgram Aura or ElevenLabs). Sub-2s time-to-first-audio with full streaming at every stage.

安全使用建议

This package implements the described voice pipeline and will stream your microphone audio and transcripts to third-party STT/TTS services (Deepgram and/or ElevenLabs) and to whatever OpenClaw gateway URL you provide. Before installing: 1) Be aware you must supply API keys (DEEPGRAM_API_KEY and/or ELEVENLABS_API_KEY) and your OPENCLAW_GATEWAY_URL/OPENCLAW_MODEL — the registry metadata does NOT list these, so the manifest is misleading. 2) Only install if you trust the skill author and the third-party providers; audio and transcripts will leave your machine. 3) Inspect scripts/server.py locally (already included) and run it in a limited environment (local machine or sandbox) before granting broader access. 4) If you don’t want to expose real data, test with dummy keys and a local gateway first. 5) Consider updating the manifest to correctly declare required secrets (primaryEnv should reference the actual API key variable) or ask the publisher for clarification.

功能分析

Type: OpenClaw Skill Name: voice-assistant Version: 0.1.0 The OpenClaw Voice Assistant skill is designed to provide a real-time voice interface. It runs a local FastAPI server (`scripts/server.py`) that handles audio streaming from the browser, interacts with external Speech-to-Text (STT) and Text-to-Speech (TTS) providers (Deepgram/ElevenLabs), and communicates with the OpenClaw gateway. The skill requires API keys for STT/TTS services, which are loaded from a local `.env` file. All network access (to STT/TTS APIs and the OpenClaw gateway) and file operations (reading `.env`, serving static files) are directly aligned with its stated purpose. The `SKILL.md` instructions guide the OpenClaw agent to perform setup and execution tasks (e.g., `cp .env.example .env`, `uv run scripts/server.py`), which are necessary for the skill's operation and do not exhibit prompt injection attempts for malicious ends. No evidence of data exfiltration, malicious execution, persistence mechanisms, or obfuscation for harmful intent was found.

能力评估

⚠ Purpose & Capability

The code and SKILL.md implement a real-time STT→LLM→TTS voice pipeline (Deepgram/ElevenLabs + OpenClaw gateway), which matches the name/description. However the registry metadata is inconsistent: it declares no required env vars and lists VOICE_STT_PROVIDER as the primary credential, but the server actually expects and uses sensitive API keys (DEEPGRAM_API_KEY, ELEVENLABS_API_KEY) plus OPENCLAW_GATEWAY_URL/OPENCLAW_MODEL. The primaryEnv should point at a secret like DEEPGRAM_API_KEY/ELEVENLABS_API_KEY (not the provider selector). This mismatch is disproportionate and confusing.

✓ Instruction Scope

SKILL.md provides concrete runtime instructions (copy .env.example to .env, fill in API keys, run uv run scripts/server.py, open browser). The runtime instructions and server code only reference expected files (.env) and the OpenClaw gateway; they stream microphone audio to configured STT/TTS providers and the OpenClaw gateway as described. There are no instructions to read unrelated system files or exfiltrate secrets beyond the STT/TTS and gateway endpoints.

✓ Install Mechanism

Install spec is a single brew formula 'uv' which is a standard package-manager install path (lower risk). The skill includes Python code and a pyproject.toml declaring normal Python dependencies (fastapi, uvicorn, httpx, websockets). No arbitrary downloads, URL shorteners, or extracted remote archives are present in the provided install spec.

⚠ Credentials

The skill requires multiple sensitive environment variables at runtime (DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, OPENCLAW_GATEWAY_URL, OPENCLAW_MODEL) but the registry metadata lists no required env vars and sets primaryEnv to VOICE_STT_PROVIDER (a non-secret). This is misleading: users will need to supply API keys for third-party STT/TTS providers and a gateway URL, but the manifest does not declare them. Requesting multiple third-party API keys is reasonable for a voice skill, but the metadata/manifest should reflect that clearly.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system-wide settings. It runs as a local server and uses normal network connections to STT/TTS providers and the OpenClaw gateway. Autonomous invocation remains possible (platform default) but is not combined with unusual privileges here.

版本历史

v0.1.0

Initial release: real-time voice interface with configurable STT (Deepgram/ElevenLabs) and TTS (Deepgram/ElevenLabs), sub-2s latency, barge-in support

元数据

Slug voice-assistant

版本 0.1.0

许可证 —

累计安装 17

当前安装数 15

历史版本数 1

常见问题

Voice Assistant 是什么？

Real-time voice assistant for OpenClaw. Streams mic audio through configurable STT (Deepgram or ElevenLabs) into your OpenClaw agent, then speaks the response via configurable TTS (Deepgram Aura or ElevenLabs). Sub-2s time-to-first-audio with full streaming at every stage. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1875 次。

如何安装 Voice Assistant？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install voice-assistant」即可一键安装，无需额外配置。

Voice Assistant 是免费的吗？

是的，Voice Assistant 完全免费（开源免费），可自由下载、安装和使用。

Voice Assistant 支持哪些平台？

Voice Assistant 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Voice Assistant？

由 Charan Tej Mandali（@charantejmandali18）开发并维护，当前版本 v0.1.0。

Voice Assistant 是什么？

如何安装 Voice Assistant？

Voice Assistant 是免费的吗？

Voice Assistant 支持哪些平台？

谁开发了 Voice Assistant？

💬 留言讨论