← 返回 Skills 市场

Voice TTS/ASR

Name: Voice TTS/ASR
Author: believe3344

作者 believe3344 · GitHub ↗ · v2.0.1 · MIT-0

cross-platform ⚠ suspicious

600

总下载

当前安装

版本数

在 OpenClaw 中安装

/install voice-tts

功能描述

语音输入（Whisper ASR）+ 语音输出（Edge TTS）技能，支持 agent 专属音色，可调用 send_voice_reply.mjs 发送 Telegram 语音消息。

安全使用建议

Before installing: 1) Verify the package includes the Python wrapper scripts referenced at scripts/whisper and scripts/edge_tts — they are referenced but not present in the provided files; without them ASR/TTS calls will fail. 2) Be aware install.sh will pip install edge-tts/whisper and download a large Whisper model (~hundreds of MB) from the network — plan disk space and network usage. 3) The skill reads ~/.openclaw/openclaw.json to obtain Telegram bot tokens; ensure that file is trustworthy and that you are comfortable the skill can access your bot tokens (or prefer to pass --token to send_voice_reply.mjs). 4) Note config parsing uses vm.runInNewContext rather than JSON.parse — this will execute the contents as JS in a VM; only use if you trust your openclaw.json. 5) If you proceed, test in a sandboxed environment first (no sensitive tokens) and confirm TTS/ASR work and that the missing Python wrappers are present/functional. If the wrappers are missing, request the complete package from the author or decline installation.

功能分析

Type: OpenClaw Skill Name: voice-tts Version: 2.0.1 The skill exhibits several high-risk patterns and vulnerabilities. Most notably, 'lib/config.mjs' and 'scripts/send_voice_reply.mjs' use 'vm.runInNewContext' to parse the main OpenClaw configuration file (~/.openclaw/openclaw.json), which constitutes a potential Remote Code Execution (RCE) vulnerability if the configuration file is tampered with. Additionally, 'bin/voice-asr.mjs' employs a prompt-injection technique by appending mandatory 'system-level' instructions to transcribed text to force the agent into using specific tools, which could be abused to hijack agent behavior. Finally, core logic files referenced in the code ('scripts/whisper' and 'scripts/edge_tts') are missing from the provided bundle, preventing a full security audit of the actual ASR/TTS execution.

能力评估

⚠ Purpose & Capability

The skill's name/description (Whisper ASR + Edge TTS, Telegram send) aligns with the binaries and Python packages it installs. However multiple JS files call Python wrapper scripts at scripts/whisper and scripts/edge_tts which are referenced by bin/voice-asr.mjs and bin/voice-tts.mjs but are not present in the provided file manifest — that will break runtime behavior and is an incoherence between claimed capability and available files.

ℹ Instruction Scope

Runtime instructions and scripts perform expected actions: transcribe audio, synthesize MP3, copy/archive inbound files (~~/.openclaw/media/inbound) into the agent workspace, and use curl to POST to Telegram. The skill reads ~/.openclaw/openclaw.json (to get skill config and Telegram tokens) and environment variables (OPENCLAW_WORKSPACE, OPENCLAW_AGENT_ID, TELEGRAM_BOT_TOKEN) — these are relevant to sending messages but mean the skill will access local agent configuration and any Telegram tokens stored there.

ℹ Install Mechanism

There is no registry install spec; the provided install.sh installs Python packages (edge-tts, whisper, click) via pip and downloads Whisper models (potentially large, e.g., ~800MB) using whisper.load_model. This is expected for a local Whisper-based ASR but involves network downloads and heavy disk usage. The script uses apt/brew and pip (standard sources) — no arbitrary binary downloads, but the heavy model download and pip installs are significant and should be expected/approved.

ℹ Credentials

The skill does not request unrelated credentials, but it reads openclaw.json to locate Telegram bot tokens and will fall back to TELEGRAM_BOT_TOKEN environment variable. That is appropriate for a Telegram sender, but gives the skill access to any bot tokens present in your config. Also config parsing uses vm.runInNewContext instead of JSON.parse, which executes the file content as JS expressions in a VM context — parsing the local config is needed for functionality, but using vm to evaluate user-supplied files increases risk if the config file is untrusted or modified.

✓ Persistence & Privilege

The skill does not request always:true nor modify other skills or global system settings. It archives inbound audio into the agent workspace and creates/deletes temporary MP3 files; these behaviors are consistent with its purpose and scoped to its own workspace.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install voice-tts
安装完成后，直接呼叫该 Skill 的名称或使用 /voice-tts 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v2.0.1

v2.0.1: 移除 voice-reply hook 描述，改为明确说明 ASR/TTS 为独立能力，语音发送需手动调用 send_voice_reply.mjs

v2.0.0

v2.0.0: agent专属音色 + Whisper本地ASR + send_voice_reply自动Token查找 + 统一错误码 + 语音文件自动归档 + config.default.json零配置可用

v1.0.0

voice-tts 1.0.0 - 首发：完整的语音输入/输出（ASR + TTS）解决方案，支持多平台（Telegram、Discord、WhatsApp、飞书）。 - 新增 Whisper 本地语音转文字脚本，支持多模型选择。 - 新增 Edge TTS 文字转语音脚本，支持多语种和语速选项。 - 触发场景详解：自动检测语音消息及主动语音请求，始终双输出（文字+语音）。 - 提供自动批量语音处理和语音回复钩子脚本。 - 详细多平台集成示例与常见故障排查。 - 安装后请将 scripts/ 目录下的 .txt 文件名后缀去掉（去掉 .txt）才能正常使用

元数据

Slug voice-tts

版本 2.0.1

许可证 MIT-0

累计安装 4

当前安装数 4

历史版本数 3

常见问题

Voice TTS/ASR 是什么？

语音输入（Whisper ASR）+ 语音输出（Edge TTS）技能，支持 agent 专属音色，可调用 send_voice_reply.mjs 发送 Telegram 语音消息。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 600 次。

如何安装 Voice TTS/ASR？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install voice-tts」即可一键安装，无需额外配置。

Voice TTS/ASR 是免费的吗？

是的，Voice TTS/ASR 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Voice TTS/ASR 支持哪些平台？

Voice TTS/ASR 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Voice TTS/ASR？

由 believe3344（@believe3344）开发并维护，当前版本 v2.0.1。