← 返回 Skills 市场
stephenredmond-straiteis

Edge TTS Voice System

作者 Stephen Redmond - Straitéis AI · GitHub ↗ · v2.1.0 · MIT-0
cross-platform ⚠ suspicious
100
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install edge-tts-voice-system
功能描述
Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,...
使用说明 (SKILL.md)

Edge TTS Voice System

A complete, privacy-focused voice system for OpenClaw that works entirely offline. No internet required, no data leaves your machine.

Features

  • Outbound replies: Edge TTS with cached audio output
  • Accurate STT: faster-whisper base model for speech recognition
  • Fully offline: No internet connection required
  • Privacy-focused: All processing happens locally
  • Easy integration: Ready-to-use Python and bash scripts
  • Voice conversations: Natural back-and-forth voice interactions

Quick Start

Installation

# Install the skill
clawhub install lessac_offline_voice_system

# Or manually from this directory
./scripts/install.sh

Basic Usage

from scripts.voice_handler import VoiceHandler

handler = VoiceHandler()

# Transcribe audio to text
text = handler.audio_to_text("voice_message.ogg")
print(f"You said: {text}")

# Generate voice response
audio_file = handler.text_to_audio("Hello, this is a voice response.")

Command Line

# Transcribe audio
./scripts/voice_integration.sh transcribe voice_message.ogg

# Generate TTS
./scripts/voice_integration.sh tts "Hello world" output.wav

# Full voice processing
./scripts/voice_integration.sh process voice_message.ogg

Components

1. Text-to-Speech (TTS)

  • Voice: Edge-supported voice (default en-IE-ConnorNeural)
  • Library: Edge TTS (edge-tts)
  • Quality: Natural speech with cached output
  • Sample rate: provider-defined

2. Speech-to-Text (STT)

  • Model: faster-whisper base
  • Accuracy: High, comparable to cloud services
  • Languages: Multi-language support (auto-detected)
  • Speed: ~2 seconds for typical audio

3. Audio Processing

  • Formats: OGG/Opus, WAV, MP3 (via ffmpeg)
  • Conversion: Automatic format handling
  • Quality: 16kHz mono for optimal recognition

Performance

  • TTS Load time: ~2 seconds (one-time)
  • TTS Generation: ~3-4 seconds
  • STT Transcription: ~2 seconds
  • Total response time: 5-7 seconds

Integration with OpenClaw

Automatic Voice Processing

When installed, the skill can be configured to automatically:

  1. Detect incoming voice messages
  2. Transcribe them silently
  3. Generate AI responses
  4. Convert responses to voice
  5. Send voice replies back

OpenClaw reply TTS configuration

The built-in OpenClaw reply TTS path is not the local voice pipeline used by this skill. This skill now uses a local Edge TTS reply path instead, with cached output stored under /root/.openclaw/tts/cache.

Default outbound voice:

  • en-IE-ConnorNeural

Relevant files:

  • tts_edge_wrapper.py
  • voice_handler.py
  • voice_integration.sh
  • scripts/install.sh

If you need to change the voice, set:

export OPENCLAW_EDGE_TTS_VOICE="en-IE-ConnorNeural"

or replace it with another Edge-supported voice.

Reinstall after OpenClaw updates

After an OpenClaw system update, rerun the installer to restore the voice stack:

cd /root/.openclaw/workspace/skills/lessac_offline_voice_system
./scripts/install.sh

This refreshes:

  • the Python venv dependencies (faster-whisper, edge-tts, soundfile)
  • the runtime cache directory
  • the local voice wrappers
  • the config file under /root/.openclaw/tts/config.json

Manual Integration

# In your OpenClaw agent or custom script
import sys
sys.path.append("/path/to/skill/scripts")
from voice_handler import VoiceHandler

class YourAgent:
    def __init__(self):
        self.voice = VoiceHandler()
    
    def handle_voice_message(self, audio_file):
        # Transcribe
        text = self.voice.audio_to_text(audio_file)
        
        # Generate response (your AI logic here)
        response = self.generate_response(text)
        
        # Convert to voice
        voice_response = self.voice.text_to_audio(response)
        
        return voice_response

Configuration

Voice Model Selection

The skill uses Edge TTS by default. To use a different voice:

  1. Set OPENCLAW_EDGE_TTS_VOICE to a supported Edge voice
  2. Re-run the installer to refresh the cache and wrappers

STT Model Selection

Change the faster-whisper model size in scripts/voice_handler.py:

  • "tiny": Fastest, lower accuracy
  • "base": Default, good balance
  • "small": Higher accuracy, slower
  • "medium": Best accuracy, slowest

Troubleshooting

Common Issues

  1. "No module named 'piper'"

    pip install piper-tts
    
  2. "ffmpeg not found"

    sudo apt-get install ffmpeg
    
  3. Out of memory with large models

    • Use "tiny" or "base" STT model
    • Use a different Edge voice if needed
  4. Slow TTS generation

    • First generation loads model (~2s)
    • Subsequent generations are faster (~0.3s per sentence)

Debug Mode

Enable debug output:

export VOICE_DEBUG=1
./scripts/voice_integration.sh process audio.ogg

Files

  • scripts/install.sh - Installation script
  • scripts/voice_handler.py - Main Python handler
  • scripts/piper_tts.py - Edge TTS wrapper
  • scripts/voice_integration.sh - Bash interface
  • references/voice_models.md - Voice model information
  • assets/ - Voice model files (downloaded during install)

Dependencies

  • Python 3.8+
  • ffmpeg
  • Python packages (installed automatically):
    • faster-whisper
    • piper-tts
    • soundfile

License

Open source. See included LICENSE file.

Support

For issues or questions:

  1. Check the troubleshooting section
  2. Review the references/ directory
  3. Open an issue on the skill repository
安全使用建议
Do not run the installer in a production environment until the issues below are addressed. Key things to consider before installing: - Offline claim vs network use: despite saying 'fully offline', the skill installs 'edge-tts' (uses hosted Edge voices) and faster-whisper will download models from HuggingFace by default. Expect network activity and model downloads unless you pre-download models and replace the code. - Missing / inconsistent files: many parts of the docs/scripts reference tts_edge_wrapper.py, but the provided files include piper_tts.py instead. The installer copies tts_edge_wrapper.py but that file is not in the manifest — installation likely fails or leaves the system in a broken state. - Path / privilege mismatches: scripts embed /root/.openclaw/... paths while the installer defaults to $HOME/.openclaw/tts. Running as root may hide these issues; prefer a non-root test environment. Avoid running install.sh as root until you audit/adjust paths. - Broken/undefined variables & bugs: voice_integration.sh references PIPER_TTS_SCRIPT (undefined) and uses a VENV_PYTHON path (/tmp/venv/bin/python) that does not match install.sh's venv location. Expect runtime failures; review and fix these variables before use. - Command injection risk: some functions build shell commands or Python -c strings by interpolating filenames without sanitization (ffmpeg conversion, faster-whisper transcribe snippet). If you feed untrusted filenames, an attacker could execute arbitrary shell/Python code. Sanitize inputs or avoid shell=True / direct string interpolation. - Network & package installs: install.sh runs apt-get/pip operations. Review the packages and run in an isolated VM/container if you want to test. If you need true offline operation, you'll need to modify the code to use local TTS models and pre-downloaded STT models and remove 'edge-tts' reliance. Recommended next steps: 1) Do a manual code review and ensure tts_edge_wrapper.py or equivalent is present and correct. 2) Fix hard-coded paths and undefined variables to use the actual INSTALL_DIR/venv paths. 3) Replace string-interpolated shell/Python invocations with safe argument lists or proper escaping. 4) Decide whether you accept that Edge TTS and faster-whisper will use network resources; if not, modify to local-only components. 5) Test the installer and runtime in an isolated environment (container or VM) and avoid running as root until paths & permissions are corrected.
功能分析
Type: OpenClaw Skill Name: edge-tts-voice-system Version: 2.1.0 The skill is classified as suspicious due to a critical shell injection vulnerability and deceptive documentation regarding privacy. In `scripts/voice_handler.py`, the `audio_to_text` function unsafely interpolates the `audio_file` variable into a shell command string executed via `subprocess.run(shell=True)`, which allows for arbitrary command execution. Furthermore, `SKILL.md` and `README.md` repeatedly claim the system is 'fully offline' and that 'no data leaves your machine,' which is factually incorrect as the implementation uses the `edge-tts` library to send text to Microsoft's cloud servers. The code also contains hardcoded paths to the `/root/` directory and inconsistent variable usage in `scripts/voice_integration.sh`, indicating a high-risk and poorly maintained codebase.
能力评估
Purpose & Capability
The skill claims to be 'fully offline' and 'privacy-focused', but the installer and docs explicitly install 'edge-tts' (which uses hosted Edge services) and refer to faster-whisper auto-downloading models from HuggingFace. The README even shows wget commands to retrieve models from huggingface.co. These behaviors contradict the 'no internet required' claim. Additionally, several referenced files (tts_edge_wrapper.py) are mentioned throughout docs and install.sh but are not present in the provided file list — the repo contains piper_tts.py instead. That mismatch suggests either missing files or sloppy packaging.
Instruction Scope
Runtime instructions and scripts embed hard-coded root paths (/root/.openclaw/tts) and make assumptions about install locations and environment (VENV paths). Several runtime scripts build shell/Python -c commands by interpolating user-supplied filenames/paths directly into strings, e.g. ffmpeg commands using shell=True and Python -c snippets with '%s' insertion. This can lead to command/ code injection if untrusted filenames are passed. There are also undefined/incorrect variable references (PIPER_TTS_SCRIPT not defined, VENV_PYTHON default '/tmp/venv/bin/python' while install.sh creates INSTALL_DIR/venv), so the runtime scope is inconsistent and fragile.
Install Mechanism
There is no registry install spec, but an included install.sh performs apt-get and pip installs, creates a venv, and runs tests. The packages installed (faster-whisper, edge-tts, soundfile) are standard PyPI packages — not inherently malicious — and README shows model downloads from HuggingFace (well-known host). This is moderate risk: installer will attempt system package installs (apt-get) and pip installs and expects network access; it is not a silent arbitrary binary download from an unknown host, but the 'fully offline' claim is inaccurate given these network operations.
Credentials
The skill declares no required credentials or sensitive environment variables, which is appropriate. It does support optional OPENCLAW_EDGE_TTS_* environment variables for voice configuration. No secrets are requested in the manifest. However, scripts expect to read/write to /root/.openclaw paths in several places, which implicitly assumes elevated privileges or a root install; that is disproportionate to a user-space skill and could lead to accidental writes to root-owned locations.
Persistence & Privilege
The skill does not set always:true and does not request elevated platform privileges. It does instruct installing files to a local directory and creating a venv/config under the install directory. However, hard-coded references to /root paths and the suggested re-run after OpenClaw updates (with specific root-bound paths) give the skill an implicit assumption of installation under root; this is inconsistent with the install.sh default (HOME/.openclaw/tts). Autonomous invocation is allowed (platform default) and is expected for skills.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install edge-tts-voice-system
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /edge-tts-voice-system 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.1.0
Full cleanup to Edge TTS outbound replies; refreshed reinstall flow and docs
v2.0.0
Rename from Lessac to Edge TTS; switch outbound replies to local Edge TTS; refresh install/reinstall docs and runtime wrappers
元数据
Slug edge-tts-voice-system
版本 2.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Edge TTS Voice System 是什么?

Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 100 次。

如何安装 Edge TTS Voice System?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install edge-tts-voice-system」即可一键安装,无需额外配置。

Edge TTS Voice System 是免费的吗?

是的,Edge TTS Voice System 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Edge TTS Voice System 支持哪些平台?

Edge TTS Voice System 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Edge TTS Voice System?

由 Stephen Redmond - Straitéis AI(@stephenredmond-straiteis)开发并维护,当前版本 v2.1.0。

💬 留言讨论