← Back to Skills Marketplace
stephenredmond-straiteis

Edge TTS Voice System

by Stephen Redmond - Straitéis AI · GitHub ↗ · v2.1.0 · MIT-0
cross-platform ⚠ suspicious
100
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install edge-tts-voice-system
Description
Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,...
README (SKILL.md)

Edge TTS Voice System

A complete, privacy-focused voice system for OpenClaw that works entirely offline. No internet required, no data leaves your machine.

Features

  • Outbound replies: Edge TTS with cached audio output
  • Accurate STT: faster-whisper base model for speech recognition
  • Fully offline: No internet connection required
  • Privacy-focused: All processing happens locally
  • Easy integration: Ready-to-use Python and bash scripts
  • Voice conversations: Natural back-and-forth voice interactions

Quick Start

Installation

# Install the skill
clawhub install lessac_offline_voice_system

# Or manually from this directory
./scripts/install.sh

Basic Usage

from scripts.voice_handler import VoiceHandler

handler = VoiceHandler()

# Transcribe audio to text
text = handler.audio_to_text("voice_message.ogg")
print(f"You said: {text}")

# Generate voice response
audio_file = handler.text_to_audio("Hello, this is a voice response.")

Command Line

# Transcribe audio
./scripts/voice_integration.sh transcribe voice_message.ogg

# Generate TTS
./scripts/voice_integration.sh tts "Hello world" output.wav

# Full voice processing
./scripts/voice_integration.sh process voice_message.ogg

Components

1. Text-to-Speech (TTS)

  • Voice: Edge-supported voice (default en-IE-ConnorNeural)
  • Library: Edge TTS (edge-tts)
  • Quality: Natural speech with cached output
  • Sample rate: provider-defined

2. Speech-to-Text (STT)

  • Model: faster-whisper base
  • Accuracy: High, comparable to cloud services
  • Languages: Multi-language support (auto-detected)
  • Speed: ~2 seconds for typical audio

3. Audio Processing

  • Formats: OGG/Opus, WAV, MP3 (via ffmpeg)
  • Conversion: Automatic format handling
  • Quality: 16kHz mono for optimal recognition

Performance

  • TTS Load time: ~2 seconds (one-time)
  • TTS Generation: ~3-4 seconds
  • STT Transcription: ~2 seconds
  • Total response time: 5-7 seconds

Integration with OpenClaw

Automatic Voice Processing

When installed, the skill can be configured to automatically:

  1. Detect incoming voice messages
  2. Transcribe them silently
  3. Generate AI responses
  4. Convert responses to voice
  5. Send voice replies back

OpenClaw reply TTS configuration

The built-in OpenClaw reply TTS path is not the local voice pipeline used by this skill. This skill now uses a local Edge TTS reply path instead, with cached output stored under /root/.openclaw/tts/cache.

Default outbound voice:

  • en-IE-ConnorNeural

Relevant files:

  • tts_edge_wrapper.py
  • voice_handler.py
  • voice_integration.sh
  • scripts/install.sh

If you need to change the voice, set:

export OPENCLAW_EDGE_TTS_VOICE="en-IE-ConnorNeural"

or replace it with another Edge-supported voice.

Reinstall after OpenClaw updates

After an OpenClaw system update, rerun the installer to restore the voice stack:

cd /root/.openclaw/workspace/skills/lessac_offline_voice_system
./scripts/install.sh

This refreshes:

  • the Python venv dependencies (faster-whisper, edge-tts, soundfile)
  • the runtime cache directory
  • the local voice wrappers
  • the config file under /root/.openclaw/tts/config.json

Manual Integration

# In your OpenClaw agent or custom script
import sys
sys.path.append("/path/to/skill/scripts")
from voice_handler import VoiceHandler

class YourAgent:
    def __init__(self):
        self.voice = VoiceHandler()
    
    def handle_voice_message(self, audio_file):
        # Transcribe
        text = self.voice.audio_to_text(audio_file)
        
        # Generate response (your AI logic here)
        response = self.generate_response(text)
        
        # Convert to voice
        voice_response = self.voice.text_to_audio(response)
        
        return voice_response

Configuration

Voice Model Selection

The skill uses Edge TTS by default. To use a different voice:

  1. Set OPENCLAW_EDGE_TTS_VOICE to a supported Edge voice
  2. Re-run the installer to refresh the cache and wrappers

STT Model Selection

Change the faster-whisper model size in scripts/voice_handler.py:

  • "tiny": Fastest, lower accuracy
  • "base": Default, good balance
  • "small": Higher accuracy, slower
  • "medium": Best accuracy, slowest

Troubleshooting

Common Issues

  1. "No module named 'piper'"

    pip install piper-tts
    
  2. "ffmpeg not found"

    sudo apt-get install ffmpeg
    
  3. Out of memory with large models

    • Use "tiny" or "base" STT model
    • Use a different Edge voice if needed
  4. Slow TTS generation

    • First generation loads model (~2s)
    • Subsequent generations are faster (~0.3s per sentence)

Debug Mode

Enable debug output:

export VOICE_DEBUG=1
./scripts/voice_integration.sh process audio.ogg

Files

  • scripts/install.sh - Installation script
  • scripts/voice_handler.py - Main Python handler
  • scripts/piper_tts.py - Edge TTS wrapper
  • scripts/voice_integration.sh - Bash interface
  • references/voice_models.md - Voice model information
  • assets/ - Voice model files (downloaded during install)

Dependencies

  • Python 3.8+
  • ffmpeg
  • Python packages (installed automatically):
    • faster-whisper
    • piper-tts
    • soundfile

License

Open source. See included LICENSE file.

Support

For issues or questions:

  1. Check the troubleshooting section
  2. Review the references/ directory
  3. Open an issue on the skill repository
Usage Guidance
Do not run the installer in a production environment until the issues below are addressed. Key things to consider before installing: - Offline claim vs network use: despite saying 'fully offline', the skill installs 'edge-tts' (uses hosted Edge voices) and faster-whisper will download models from HuggingFace by default. Expect network activity and model downloads unless you pre-download models and replace the code. - Missing / inconsistent files: many parts of the docs/scripts reference tts_edge_wrapper.py, but the provided files include piper_tts.py instead. The installer copies tts_edge_wrapper.py but that file is not in the manifest — installation likely fails or leaves the system in a broken state. - Path / privilege mismatches: scripts embed /root/.openclaw/... paths while the installer defaults to $HOME/.openclaw/tts. Running as root may hide these issues; prefer a non-root test environment. Avoid running install.sh as root until you audit/adjust paths. - Broken/undefined variables & bugs: voice_integration.sh references PIPER_TTS_SCRIPT (undefined) and uses a VENV_PYTHON path (/tmp/venv/bin/python) that does not match install.sh's venv location. Expect runtime failures; review and fix these variables before use. - Command injection risk: some functions build shell commands or Python -c strings by interpolating filenames without sanitization (ffmpeg conversion, faster-whisper transcribe snippet). If you feed untrusted filenames, an attacker could execute arbitrary shell/Python code. Sanitize inputs or avoid shell=True / direct string interpolation. - Network & package installs: install.sh runs apt-get/pip operations. Review the packages and run in an isolated VM/container if you want to test. If you need true offline operation, you'll need to modify the code to use local TTS models and pre-downloaded STT models and remove 'edge-tts' reliance. Recommended next steps: 1) Do a manual code review and ensure tts_edge_wrapper.py or equivalent is present and correct. 2) Fix hard-coded paths and undefined variables to use the actual INSTALL_DIR/venv paths. 3) Replace string-interpolated shell/Python invocations with safe argument lists or proper escaping. 4) Decide whether you accept that Edge TTS and faster-whisper will use network resources; if not, modify to local-only components. 5) Test the installer and runtime in an isolated environment (container or VM) and avoid running as root until paths & permissions are corrected.
Capability Analysis
Type: OpenClaw Skill Name: edge-tts-voice-system Version: 2.1.0 The skill is classified as suspicious due to a critical shell injection vulnerability and deceptive documentation regarding privacy. In `scripts/voice_handler.py`, the `audio_to_text` function unsafely interpolates the `audio_file` variable into a shell command string executed via `subprocess.run(shell=True)`, which allows for arbitrary command execution. Furthermore, `SKILL.md` and `README.md` repeatedly claim the system is 'fully offline' and that 'no data leaves your machine,' which is factually incorrect as the implementation uses the `edge-tts` library to send text to Microsoft's cloud servers. The code also contains hardcoded paths to the `/root/` directory and inconsistent variable usage in `scripts/voice_integration.sh`, indicating a high-risk and poorly maintained codebase.
Capability Assessment
Purpose & Capability
The skill claims to be 'fully offline' and 'privacy-focused', but the installer and docs explicitly install 'edge-tts' (which uses hosted Edge services) and refer to faster-whisper auto-downloading models from HuggingFace. The README even shows wget commands to retrieve models from huggingface.co. These behaviors contradict the 'no internet required' claim. Additionally, several referenced files (tts_edge_wrapper.py) are mentioned throughout docs and install.sh but are not present in the provided file list — the repo contains piper_tts.py instead. That mismatch suggests either missing files or sloppy packaging.
Instruction Scope
Runtime instructions and scripts embed hard-coded root paths (/root/.openclaw/tts) and make assumptions about install locations and environment (VENV paths). Several runtime scripts build shell/Python -c commands by interpolating user-supplied filenames/paths directly into strings, e.g. ffmpeg commands using shell=True and Python -c snippets with '%s' insertion. This can lead to command/ code injection if untrusted filenames are passed. There are also undefined/incorrect variable references (PIPER_TTS_SCRIPT not defined, VENV_PYTHON default '/tmp/venv/bin/python' while install.sh creates INSTALL_DIR/venv), so the runtime scope is inconsistent and fragile.
Install Mechanism
There is no registry install spec, but an included install.sh performs apt-get and pip installs, creates a venv, and runs tests. The packages installed (faster-whisper, edge-tts, soundfile) are standard PyPI packages — not inherently malicious — and README shows model downloads from HuggingFace (well-known host). This is moderate risk: installer will attempt system package installs (apt-get) and pip installs and expects network access; it is not a silent arbitrary binary download from an unknown host, but the 'fully offline' claim is inaccurate given these network operations.
Credentials
The skill declares no required credentials or sensitive environment variables, which is appropriate. It does support optional OPENCLAW_EDGE_TTS_* environment variables for voice configuration. No secrets are requested in the manifest. However, scripts expect to read/write to /root/.openclaw paths in several places, which implicitly assumes elevated privileges or a root install; that is disproportionate to a user-space skill and could lead to accidental writes to root-owned locations.
Persistence & Privilege
The skill does not set always:true and does not request elevated platform privileges. It does instruct installing files to a local directory and creating a venv/config under the install directory. However, hard-coded references to /root paths and the suggested re-run after OpenClaw updates (with specific root-bound paths) give the skill an implicit assumption of installation under root; this is inconsistent with the install.sh default (HOME/.openclaw/tts). Autonomous invocation is allowed (platform default) and is expected for skills.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install edge-tts-voice-system
  3. After installation, invoke the skill by name or use /edge-tts-voice-system
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.1.0
Full cleanup to Edge TTS outbound replies; refreshed reinstall flow and docs
v2.0.0
Rename from Lessac to Edge TTS; switch outbound replies to local Edge TTS; refresh install/reinstall docs and runtime wrappers
Metadata
Slug edge-tts-voice-system
Version 2.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Edge TTS Voice System?

Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,... It is an AI Agent Skill for Claude Code / OpenClaw, with 100 downloads so far.

How do I install Edge TTS Voice System?

Run "/install edge-tts-voice-system" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Edge TTS Voice System free?

Yes, Edge TTS Voice System is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Edge TTS Voice System support?

Edge TTS Voice System is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Edge TTS Voice System?

It is built and maintained by Stephen Redmond - Straitéis AI (@stephenredmond-straiteis); the current version is v2.1.0.

💬 Comments