Voice Chat Bridge

Name: Voice Chat Bridge
Author: zhangqinghua2015

功能描述

自动处理语音消息：将语音转写为文字，结合上下文生成智能回复，并合成语音回复。当收到语音或音频消息时自动激活。

安全使用建议

Summary of what to check before installing: - Transparency: The skill metadata declares no required environment variables, but the code expects several (OPENCLAW_SESSION_ID, OPENCLAW_GATEWAY_URL, OPENCLAW_API_KEY, EDGE_TTS_BIN, VOSK_MODEL_DIR, SHERPA_MODEL_DIR, etc.). Treat that as a red flag — confirm with the author which env vars are required and why. - Paths and triggers: SKILL.md / README reference different install paths (/root/.agents/... vs /root/.openclaw/...). The runtime scripts call hard-coded absolute paths in docs; verify the actual installation location and update triggers (SOUL.md) to avoid accidental execution in the wrong context. - Session reuse: run_voice_chat.py can invoke the OpenClaw CLI using an OPENCLAW_SESSION_ID to reuse the current session's LLM context. If you provide a session id, the skill will operate with that session's context — only grant this to skills you fully trust. If unsure, do not set OPENCLAW_SESSION_ID and accept degraded behavior. - Model and dependency risks: The skill requires heavy native packages and large models (sherpa-onnx, downloaded model archives, Vosk). Install and run in an isolated environment (container/VM) and be prepared for large disk/CPU usage. The README instructs downloading models from GitHub releases (legitimate), but verify checksums if possible. - Network/data exfiltration: The code does not contain obvious exfiltration endpoints, but it can call external TTS services (Edge TTS) and will invoke system commands. Review and restrict EDGE_TTS_BIN if you want to force local-only TTS (pyttsx3) and avoid cloud TTS. - Least privilege: If you want to test, run the skill offline with Vosk and pyttsx3 only (set STT_ENGINE=vosk and EDGE_TTS_BIN unset), and do not provide OPENCLAW_SESSION_ID or any API keys. Run in a safe environment and confirm behavior before enabling automatic triggers in your live gateway. - If you are not comfortable with the undeclared env requirements and hard-coded path assumptions, treat this skill as untrusted and avoid granting it session-level credentials or enabling autonomous triggers until the author fixes the metadata and path inconsistencies.

功能分析

Type: OpenClaw Skill Name: voice-chat Version: 2.3.5 The voice-chat skill provides a comprehensive bridge for processing voice messages using STT (Sherpa-ONNX and Vosk) and TTS (Edge-TTS and pyttsx3) engines. The code logic in scripts like `transcribe_audio.py` and `reply_with_tts.py` is well-structured, using subprocess calls to system utilities like ffmpeg and the openclaw CLI in a manner consistent with its stated purpose. The SKILL.md and README.md files provide clear, functional instructions for the AI agent to automate the voice-to-text-to-voice loop without any evidence of malicious prompt injection, data exfiltration, or unauthorized persistence.

能力评估

ℹ Purpose & Capability

The code and documentation implement a coherent voice processing pipeline (STT via Sherpa-ONNX/Vosk, LLM-driven replies, Edge TTS / pyttsx3 fallback) which matches the skill description. However the repository also references additional STT engines, LLM configuration, and model directories not declared in the skill metadata. Some helper modules (telegram/feishu handlers) import network libraries and contain legacy API helpers even though the runtime intends OpenClaw to handle transport — this is plausible but not strictly minimal for the stated purpose.

⚠ Instruction Scope

SKILL.md and scripts instruct the agent to run local scripts at specific absolute paths (e.g. /root/.agents/skills/voice-chat/... and elsewhere /root/.openclaw/... depending on doc), call local binaries (ffmpeg, edge-tts) and invoke the OpenClaw CLI to reuse a session (openclaw agent --session-id ...). The skill requires reading message media paths and writing temporary files under /tmp/voice-chat; it also expects OPENCLAW_SESSION_ID/OPENCLAW_GATEWAY_URL/OPENCLAW_API_KEY environment variables at runtime (used by run_voice_chat.py), but the skill metadata lists no required envs — this mismatch is a scope and transparency concern. The instructions also instruct automatic activation when voice messages are received, so a misconfigured trigger could cause frequent autonomous runs.

ℹ Install Mechanism

There is no automated install spec (instruction-only), which lowers installer risk; however the skill requires heavy third-party packages and large model downloads (sherpa-onnx, Vosk models) and README instructs users to wget GitHub release archives (GitHub is a legitimate host). No arbitrary personal server or URL shorteners are used in the provided docs. Because large models and native dependencies are required, installation is non-trivial and should be done in an isolated environment.

⚠ Credentials

The skill metadata declares no required environment variables or primary credential, but the code expects and reads multiple environment variables (OPENCLAW_SESSION_ID, OPENCLAW_GATEWAY_URL, OPENCLAW_API_KEY, EDGE_TTS_BIN, VOSK_MODEL_DIR, SHERPA_MODEL_DIR, STT_ENGINE, etc.) and optionally LLM API info. This omission is an incoherence: the skill will depend on secrets or session identifiers if used in its 'full' mode but doesn't advertise that to the installer. Any skill that reuses an OpenClaw session ID can act with the session's context — that capability should be explicit to administrators.

ℹ Persistence & Privilege

The skill is not forced always-on and does not declare elevated platform privileges. It runs subprocesses (openclaw CLI, ffmpeg, edge-tts) to reuse the current session's LLM context if OPENCLAW_SESSION_ID is set. Autonomous invocation is allowed by default (normal for skills) — combined with the undeclared session usage this increases blast radius, but the skill does not request persistent modifications to other skills or system-wide config.

版本历史

v2.3.5

包含完整的 Sherpa-ONNX 修复代码（numpy 归一化）

v2.3.4

修复 Sherpa-ONNX 样本归一化问题

v2.3.2

修正 Sherpa-ONNX 模型名称和下载方式

v2.3.1

改用 Sherpa-ONNX 引擎，移除 CUDA/NVIDIA 依赖

v2.3.0

集成 SenseVoice 双引擎 STT，支持自动 fallback

v2.0.0

集成 SenseVoice 双引擎 STT，支持自动 fallback

v2.2.1

- Updated documentation in CHANGELOG.md, DESIGN.md, and README.md to clarify and improve guidance. - No functional or behavioral changes to the skill logic. - Improved wording and formatting for better readability and understanding.

v2.2.0

- Skill名称由 voice-chat-bridge 更名为 voice-chat。 - 相关文件路径由 voice-chat-bridge 统一调整为 voice-chat。 - 临时文件目录更新为 /tmp/voice-chat/。 - 文档（README.md、SKILL.md）内容同步更新对应名称、路径和描述。

v2.1.7

- Updated documentation in README.md for clarity and completeness. - No changes to the skill's logic or features.

v2.1.6

- Updated README.md with no code or feature changes. - Documentation now reflects the current skill process and dependencies more clearly.

v2.1.5

voice-chat 2.1.5 - Updated scripts/reply_with_tts.py with minor changes. - No updates to functionality or documentation.

v2.1.4

- No changes detected in this version; documentation and functionality remain the same.

v2.1.3

voice-chat 2.1.3 - scripts/config.py and scripts/reply_with_tts.py updated. - No user-facing feature or documentation changes.

v2.1.2

- Updated the SKILL.md with a plain-language task and workflow description. - Expanded and clarified usage steps with direct command examples for STT and TTS scripts. - Listed the audio processing flow with context prompt for LLM and technical details. - Provided clear dependency, file path, and trigger information. - Removed previous long-form code and architecture explanations for conciseness. - Added a short summary and description fields at the top.

v2.1.1

- Added complete auto-loop for voice messaging: receive, transcribe, analyze, TTS reply, and send back in chat. - Now supports both Telegram and Feishu voice/audio messages automatically. - Integrates Vosk for local speech-to-text and edge-tts for speech synthesis (OGG/Opus). - Current LLM step is a placeholder with simple replies; ready for future LLM integration in code. - Outputs transcript, reply text, and path to generated voice file for each message.

元数据

Slug voice-chat

版本 2.3.5

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 15

常见问题

Voice Chat Bridge 是什么？

自动处理语音消息：将语音转写为文字，结合上下文生成智能回复，并合成语音回复。当收到语音或音频消息时自动激活。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 238 次。

如何安装 Voice Chat Bridge？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install voice-chat」即可一键安装，无需额外配置。

Voice Chat Bridge 是免费的吗？

是的，Voice Chat Bridge 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Voice Chat Bridge 支持哪些平台？

Voice Chat Bridge 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Voice Chat Bridge？

由 zhangqinghua2015（@zhangqinghua2015）开发并维护，当前版本 v2.3.5。

Voice Chat Bridge 是什么？

如何安装 Voice Chat Bridge？

Voice Chat Bridge 是免费的吗？

Voice Chat Bridge 支持哪些平台？

谁开发了 Voice Chat Bridge？

💬 留言讨论