Description

Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,...

README (SKILL.md)

Edge TTS Voice System

Name: Edge TTS Voice System
Author: stephenredmond-straiteis

A complete, privacy-focused voice system for OpenClaw that works entirely offline. No internet required, no data leaves your machine.

Features

Outbound replies: Edge TTS with cached audio output
Accurate STT: faster-whisper base model for speech recognition
Fully offline: No internet connection required
Privacy-focused: All processing happens locally
Easy integration: Ready-to-use Python and bash scripts
Voice conversations: Natural back-and-forth voice interactions

Quick Start

Installation

# Install the skill
clawhub install lessac_offline_voice_system

# Or manually from this directory
./scripts/install.sh

Basic Usage

from scripts.voice_handler import VoiceHandler

handler = VoiceHandler()

# Transcribe audio to text
text = handler.audio_to_text("voice_message.ogg")
print(f"You said: {text}")

# Generate voice response
audio_file = handler.text_to_audio("Hello, this is a voice response.")

Command Line

# Transcribe audio
./scripts/voice_integration.sh transcribe voice_message.ogg

# Generate TTS
./scripts/voice_integration.sh tts "Hello world" output.wav

# Full voice processing
./scripts/voice_integration.sh process voice_message.ogg

Components

1. Text-to-Speech (TTS)

Voice: Edge-supported voice (default en-IE-ConnorNeural)
Library: Edge TTS (edge-tts)
Quality: Natural speech with cached output
Sample rate: provider-defined

2. Speech-to-Text (STT)

Model: faster-whisper base
Accuracy: High, comparable to cloud services
Languages: Multi-language support (auto-detected)
Speed: ~2 seconds for typical audio

3. Audio Processing

Formats: OGG/Opus, WAV, MP3 (via ffmpeg)
Conversion: Automatic format handling
Quality: 16kHz mono for optimal recognition

Performance

TTS Load time: ~2 seconds (one-time)
TTS Generation: ~3-4 seconds
STT Transcription: ~2 seconds
Total response time: 5-7 seconds

Integration with OpenClaw

Automatic Voice Processing

When installed, the skill can be configured to automatically:

Detect incoming voice messages
Transcribe them silently
Generate AI responses
Convert responses to voice
Send voice replies back

OpenClaw reply TTS configuration

The built-in OpenClaw reply TTS path is not the local voice pipeline used by this skill. This skill now uses a local Edge TTS reply path instead, with cached output stored under /root/.openclaw/tts/cache.

Default outbound voice:

en-IE-ConnorNeural

Relevant files:

tts_edge_wrapper.py
voice_handler.py
voice_integration.sh
scripts/install.sh

If you need to change the voice, set:

export OPENCLAW_EDGE_TTS_VOICE="en-IE-ConnorNeural"

or replace it with another Edge-supported voice.

Reinstall after OpenClaw updates

After an OpenClaw system update, rerun the installer to restore the voice stack:

cd /root/.openclaw/workspace/skills/lessac_offline_voice_system
./scripts/install.sh

This refreshes:

the Python venv dependencies (faster-whisper, edge-tts, soundfile)
the runtime cache directory
the local voice wrappers
the config file under /root/.openclaw/tts/config.json

Manual Integration

# In your OpenClaw agent or custom script
import sys
sys.path.append("/path/to/skill/scripts")
from voice_handler import VoiceHandler

class YourAgent:
    def __init__(self):
        self.voice = VoiceHandler()
    
    def handle_voice_message(self, audio_file):
        # Transcribe
        text = self.voice.audio_to_text(audio_file)
        
        # Generate response (your AI logic here)
        response = self.generate_response(text)
        
        # Convert to voice
        voice_response = self.voice.text_to_audio(response)
        
        return voice_response

Configuration

Voice Model Selection

The skill uses Edge TTS by default. To use a different voice:

Set OPENCLAW_EDGE_TTS_VOICE to a supported Edge voice
Re-run the installer to refresh the cache and wrappers

STT Model Selection

Change the faster-whisper model size in scripts/voice_handler.py:

"tiny": Fastest, lower accuracy
"base": Default, good balance
"small": Higher accuracy, slower
"medium": Best accuracy, slowest

Troubleshooting

Common Issues

"No module named 'piper'"
```
pip install piper-tts
```
"ffmpeg not found"
```
sudo apt-get install ffmpeg
```
Out of memory with large models
- Use "tiny" or "base" STT model
- Use a different Edge voice if needed
Slow TTS generation
- First generation loads model (~2s)
- Subsequent generations are faster (~0.3s per sentence)

Debug Mode

Enable debug output:

export VOICE_DEBUG=1
./scripts/voice_integration.sh process audio.ogg

Files

scripts/install.sh - Installation script
scripts/voice_handler.py - Main Python handler
scripts/piper_tts.py - Edge TTS wrapper
scripts/voice_integration.sh - Bash interface
references/voice_models.md - Voice model information
assets/ - Voice model files (downloaded during install)

Dependencies

Python 3.8+
ffmpeg
Python packages (installed automatically):
- faster-whisper
- piper-tts
- soundfile

License

Open source. See included LICENSE file.

Support

For issues or questions:

Check the troubleshooting section
Review the references/ directory
Open an issue on the skill repository

Usage Guidance

Do not run the installer in a production environment until the issues below are addressed. Key things to consider before installing: - Offline claim vs network use: despite saying 'fully offline', the skill installs 'edge-tts' (uses hosted Edge voices) and faster-whisper will download models from HuggingFace by default. Expect network activity and model downloads unless you pre-download models and replace the code. - Missing / inconsistent files: many parts of the docs/scripts reference tts_edge_wrapper.py, but the provided files include piper_tts.py instead. The installer copies tts_edge_wrapper.py but that file is not in the manifest — installation likely fails or leaves the system in a broken state. - Path / privilege mismatches: scripts embed /root/.openclaw/... paths while the installer defaults to $HOME/.openclaw/tts. Running as root may hide these issues; prefer a non-root test environment. Avoid running install.sh as root until you audit/adjust paths. - Broken/undefined variables & bugs: voice_integration.sh references PIPER_TTS_SCRIPT (undefined) and uses a VENV_PYTHON path (/tmp/venv/bin/python) that does not match install.sh's venv location. Expect runtime failures; review and fix these variables before use. - Command injection risk: some functions build shell commands or Python -c strings by interpolating filenames without sanitization (ffmpeg conversion, faster-whisper transcribe snippet). If you feed untrusted filenames, an attacker could execute arbitrary shell/Python code. Sanitize inputs or avoid shell=True / direct string interpolation. - Network & package installs: install.sh runs apt-get/pip operations. Review the packages and run in an isolated VM/container if you want to test. If you need true offline operation, you'll need to modify the code to use local TTS models and pre-downloaded STT models and remove 'edge-tts' reliance. Recommended next steps: 1) Do a manual code review and ensure tts_edge_wrapper.py or equivalent is present and correct. 2) Fix hard-coded paths and undefined variables to use the actual INSTALL_DIR/venv paths. 3) Replace string-interpolated shell/Python invocations with safe argument lists or proper escaping. 4) Decide whether you accept that Edge TTS and faster-whisper will use network resources; if not, modify to local-only components. 5) Test the installer and runtime in an isolated environment (container or VM) and avoid running as root until paths & permissions are corrected.

Capability Analysis

Type: OpenClaw Skill Name: edge-tts-voice-system Version: 2.1.0 The skill is classified as suspicious due to a critical shell injection vulnerability and deceptive documentation regarding privacy. In `scripts/voice_handler.py`, the `audio_to_text` function unsafely interpolates the `audio_file` variable into a shell command string executed via `subprocess.run(shell=True)`, which allows for arbitrary command execution. Furthermore, `SKILL.md` and `README.md` repeatedly claim the system is 'fully offline' and that 'no data leaves your machine,' which is factually incorrect as the implementation uses the `edge-tts` library to send text to Microsoft's cloud servers. The code also contains hardcoded paths to the `/root/` directory and inconsistent variable usage in `scripts/voice_integration.sh`, indicating a high-risk and poorly maintained codebase.

Capability Assessment

⚠ Purpose & Capability

The skill claims to be 'fully offline' and 'privacy-focused', but the installer and docs explicitly install 'edge-tts' (which uses hosted Edge services) and refer to faster-whisper auto-downloading models from HuggingFace. The README even shows wget commands to retrieve models from huggingface.co. These behaviors contradict the 'no internet required' claim. Additionally, several referenced files (tts_edge_wrapper.py) are mentioned throughout docs and install.sh but are not present in the provided file list — the repo contains piper_tts.py instead. That mismatch suggests either missing files or sloppy packaging.

⚠ Instruction Scope

Runtime instructions and scripts embed hard-coded root paths (/root/.openclaw/tts) and make assumptions about install locations and environment (VENV paths). Several runtime scripts build shell/Python -c commands by interpolating user-supplied filenames/paths directly into strings, e.g. ffmpeg commands using shell=True and Python -c snippets with '%s' insertion. This can lead to command/ code injection if untrusted filenames are passed. There are also undefined/incorrect variable references (PIPER_TTS_SCRIPT not defined, VENV_PYTHON default '/tmp/venv/bin/python' while install.sh creates INSTALL_DIR/venv), so the runtime scope is inconsistent and fragile.

ℹ Install Mechanism

There is no registry install spec, but an included install.sh performs apt-get and pip installs, creates a venv, and runs tests. The packages installed (faster-whisper, edge-tts, soundfile) are standard PyPI packages — not inherently malicious — and README shows model downloads from HuggingFace (well-known host). This is moderate risk: installer will attempt system package installs (apt-get) and pip installs and expects network access; it is not a silent arbitrary binary download from an unknown host, but the 'fully offline' claim is inaccurate given these network operations.

ℹ Credentials

The skill declares no required credentials or sensitive environment variables, which is appropriate. It does support optional OPENCLAW_EDGE_TTS_* environment variables for voice configuration. No secrets are requested in the manifest. However, scripts expect to read/write to /root/.openclaw paths in several places, which implicitly assumes elevated privileges or a root install; that is disproportionate to a user-space skill and could lead to accidental writes to root-owned locations.

ℹ Persistence & Privilege

The skill does not set always:true and does not request elevated platform privileges. It does instruct installing files to a local directory and creating a venv/config under the install directory. However, hard-coded references to /root paths and the suggested re-run after OpenClaw updates (with specific root-bound paths) give the skill an implicit assumption of installation under root; this is inconsistent with the install.sh default (HOME/.openclaw/tts). Autonomous invocation is allowed (platform default) and is expected for skills.

Version History

v2.1.0

Full cleanup to Edge TTS outbound replies; refreshed reinstall flow and docs

v2.0.0

Rename from Lessac to Edge TTS; switch outbound replies to local Edge TTS; refresh install/reinstall docs and runtime wrappers

Metadata

Slug edge-tts-voice-system

Version 2.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Edge TTS Voice System?

Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,... It is an AI Agent Skill for Claude Code / OpenClaw, with 100 downloads so far.

How do I install Edge TTS Voice System?

Run "/install edge-tts-voice-system" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Edge TTS Voice System free?

Yes, Edge TTS Voice System is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Edge TTS Voice System support?

Edge TTS Voice System is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Edge TTS Voice System?

It is built and maintained by Stephen Redmond - Straitéis AI (@stephenredmond-straiteis); the current version is v2.1.0.

More Skills

Edge TTS Voice System

Edge TTS Voice System

Features

Quick Start

Installation

Basic Usage

Command Line

Components

1. Text-to-Speech (TTS)

2. Speech-to-Text (STT)

3. Audio Processing

Performance

Integration with OpenClaw

Automatic Voice Processing

OpenClaw reply TTS configuration

Reinstall after OpenClaw updates

Manual Integration

Configuration

Voice Model Selection

STT Model Selection

Troubleshooting

Common Issues

Debug Mode

Files

Dependencies

License

Support

What is Edge TTS Voice System?

How do I install Edge TTS Voice System?

Is Edge TTS Voice System free?

Which platforms does Edge TTS Voice System support?

Who created Edge TTS Voice System?

💬 Comments