功能描述

Local voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows,...

使用说明 (SKILL.md)

Edge TTS Voice System

Name: Edge TTS Voice System
Author: stephenredmond-straiteis

A complete, privacy-focused voice system for OpenClaw that works entirely offline. No internet required, no data leaves your machine.

Features

Outbound replies: Edge TTS with cached audio output
Accurate STT: faster-whisper base model for speech recognition
Fully offline: No internet connection required
Privacy-focused: All processing happens locally
Easy integration: Ready-to-use Python and bash scripts
Voice conversations: Natural back-and-forth voice interactions

Quick Start

Installation

# Install the skill
clawhub install lessac_offline_voice_system

# Or manually from this directory
./scripts/install.sh

Basic Usage

from scripts.voice_handler import VoiceHandler

handler = VoiceHandler()

# Transcribe audio to text
text = handler.audio_to_text("voice_message.ogg")
print(f"You said: {text}")

# Generate voice response
audio_file = handler.text_to_audio("Hello, this is a voice response.")

Command Line

# Transcribe audio
./scripts/voice_integration.sh transcribe voice_message.ogg

# Generate TTS
./scripts/voice_integration.sh tts "Hello world" output.wav

# Full voice processing
./scripts/voice_integration.sh process voice_message.ogg

Components

1. Text-to-Speech (TTS)

Voice: Edge-supported voice (default en-IE-ConnorNeural)
Library: Edge TTS (edge-tts)
Quality: Natural speech with cached output
Sample rate: provider-defined

2. Speech-to-Text (STT)

Model: faster-whisper base
Accuracy: High, comparable to cloud services
Languages: Multi-language support (auto-detected)
Speed: ~2 seconds for typical audio

3. Audio Processing

Formats: OGG/Opus, WAV, MP3 (via ffmpeg)
Conversion: Automatic format handling
Quality: 16kHz mono for optimal recognition

Performance

TTS Load time: ~2 seconds (one-time)
TTS Generation: ~3-4 seconds
STT Transcription: ~2 seconds
Total response time: 5-7 seconds

Integration with OpenClaw

Automatic Voice Processing

When installed, the skill can be configured to automatically:

Detect incoming voice messages
Transcribe them silently
Generate AI responses
Convert responses to voice
Send voice replies back

OpenClaw reply TTS configuration

The built-in OpenClaw reply TTS path is not the local voice pipeline used by this skill. This skill now uses a local Edge TTS reply path instead, with cached output stored under /root/.openclaw/tts/cache.

Default outbound voice:

en-IE-ConnorNeural

Relevant files:

tts_edge_wrapper.py
voice_handler.py
voice_integration.sh
scripts/install.sh

If you need to change the voice, set:

export OPENCLAW_EDGE_TTS_VOICE="en-IE-ConnorNeural"

or replace it with another Edge-supported voice.

Reinstall after OpenClaw updates

After an OpenClaw system update, rerun the installer to restore the voice stack:

cd /root/.openclaw/workspace/skills/lessac_offline_voice_system
./scripts/install.sh

This refreshes:

the Python venv dependencies (faster-whisper, edge-tts, soundfile)
the runtime cache directory
the local voice wrappers
the config file under /root/.openclaw/tts/config.json

Manual Integration

# In your OpenClaw agent or custom script
import sys
sys.path.append("/path/to/skill/scripts")
from voice_handler import VoiceHandler

class YourAgent:
    def __init__(self):
        self.voice = VoiceHandler()
    
    def handle_voice_message(self, audio_file):
        # Transcribe
        text = self.voice.audio_to_text(audio_file)
        
        # Generate response (your AI logic here)
        response = self.generate_response(text)
        
        # Convert to voice
        voice_response = self.voice.text_to_audio(response)
        
        return voice_response

Configuration

Voice Model Selection

The skill uses Edge TTS by default. To use a different voice:

Set OPENCLAW_EDGE_TTS_VOICE to a supported Edge voice
Re-run the installer to refresh the cache and wrappers

STT Model Selection

Change the faster-whisper model size in scripts/voice_handler.py:

"tiny": Fastest, lower accuracy
"base": Default, good balance
"small": Higher accuracy, slower
"medium": Best accuracy, slowest

Troubleshooting

Common Issues

"No module named 'piper'"
```
pip install piper-tts
```
"ffmpeg not found"
```
sudo apt-get install ffmpeg
```
Out of memory with large models
- Use "tiny" or "base" STT model
- Use a different Edge voice if needed
Slow TTS generation
- First generation loads model (~2s)
- Subsequent generations are faster (~0.3s per sentence)

Debug Mode

Enable debug output:

export VOICE_DEBUG=1
./scripts/voice_integration.sh process audio.ogg

Files

scripts/install.sh - Installation script
scripts/voice_handler.py - Main Python handler
scripts/piper_tts.py - Edge TTS wrapper
scripts/voice_integration.sh - Bash interface
references/voice_models.md - Voice model information
assets/ - Voice model files (downloaded during install)

Dependencies

Python 3.8+
ffmpeg
Python packages (installed automatically):
- faster-whisper
- piper-tts
- soundfile

License

Open source. See included LICENSE file.

Support

For issues or questions:

Check the troubleshooting section
Review the references/ directory
Open an issue on the skill repository

安全使用建议

Do not run the installer in a production environment until the issues below are addressed. Key things to consider before installing: - Offline claim vs network use: despite saying 'fully offline', the skill installs 'edge-tts' (uses hosted Edge voices) and faster-whisper will download models from HuggingFace by default. Expect network activity and model downloads unless you pre-download models and replace the code. - Missing / inconsistent files: many parts of the docs/scripts reference tts_edge_wrapper.py, but the provided files include piper_tts.py instead. The installer copies tts_edge_wrapper.py but that file is not in the manifest — installation likely fails or leaves the system in a broken state. - Path / privilege mismatches: scripts embed /root/.openclaw/... paths while the installer defaults to $HOME/.openclaw/tts. Running as root may hide these issues; prefer a non-root test environment. Avoid running install.sh as root until you audit/adjust paths. - Broken/undefined variables & bugs: voice_integration.sh references PIPER_TTS_SCRIPT (undefined) and uses a VENV_PYTHON path (/tmp/venv/bin/python) that does not match install.sh's venv location. Expect runtime failures; review and fix these variables before use. - Command injection risk: some functions build shell commands or Python -c strings by interpolating filenames without sanitization (ffmpeg conversion, faster-whisper transcribe snippet). If you feed untrusted filenames, an attacker could execute arbitrary shell/Python code. Sanitize inputs or avoid shell=True / direct string interpolation. - Network & package installs: install.sh runs apt-get/pip operations. Review the packages and run in an isolated VM/container if you want to test. If you need true offline operation, you'll need to modify the code to use local TTS models and pre-downloaded STT models and remove 'edge-tts' reliance. Recommended next steps: 1) Do a manual code review and ensure tts_edge_wrapper.py or equivalent is present and correct. 2) Fix hard-coded paths and undefined variables to use the actual INSTALL_DIR/venv paths. 3) Replace string-interpolated shell/Python invocations with safe argument lists or proper escaping. 4) Decide whether you accept that Edge TTS and faster-whisper will use network resources; if not, modify to local-only components. 5) Test the installer and runtime in an isolated environment (container or VM) and avoid running as root until paths & permissions are corrected.

功能分析

Type: OpenClaw Skill Name: edge-tts-voice-system Version: 2.1.0 The skill is classified as suspicious due to a critical shell injection vulnerability and deceptive documentation regarding privacy. In `scripts/voice_handler.py`, the `audio_to_text` function unsafely interpolates the `audio_file` variable into a shell command string executed via `subprocess.run(shell=True)`, which allows for arbitrary command execution. Furthermore, `SKILL.md` and `README.md` repeatedly claim the system is 'fully offline' and that 'no data leaves your machine,' which is factually incorrect as the implementation uses the `edge-tts` library to send text to Microsoft's cloud servers. The code also contains hardcoded paths to the `/root/` directory and inconsistent variable usage in `scripts/voice_integration.sh`, indicating a high-risk and poorly maintained codebase.

能力评估

⚠ Purpose & Capability

The skill claims to be 'fully offline' and 'privacy-focused', but the installer and docs explicitly install 'edge-tts' (which uses hosted Edge services) and refer to faster-whisper auto-downloading models from HuggingFace. The README even shows wget commands to retrieve models from huggingface.co. These behaviors contradict the 'no internet required' claim. Additionally, several referenced files (tts_edge_wrapper.py) are mentioned throughout docs and install.sh but are not present in the provided file list — the repo contains piper_tts.py instead. That mismatch suggests either missing files or sloppy packaging.

⚠ Instruction Scope

Runtime instructions and scripts embed hard-coded root paths (/root/.openclaw/tts) and make assumptions about install locations and environment (VENV paths). Several runtime scripts build shell/Python -c commands by interpolating user-supplied filenames/paths directly into strings, e.g. ffmpeg commands using shell=True and Python -c snippets with '%s' insertion. This can lead to command/ code injection if untrusted filenames are passed. There are also undefined/incorrect variable references (PIPER_TTS_SCRIPT not defined, VENV_PYTHON default '/tmp/venv/bin/python' while install.sh creates INSTALL_DIR/venv), so the runtime scope is inconsistent and fragile.

ℹ Install Mechanism

There is no registry install spec, but an included install.sh performs apt-get and pip installs, creates a venv, and runs tests. The packages installed (faster-whisper, edge-tts, soundfile) are standard PyPI packages — not inherently malicious — and README shows model downloads from HuggingFace (well-known host). This is moderate risk: installer will attempt system package installs (apt-get) and pip installs and expects network access; it is not a silent arbitrary binary download from an unknown host, but the 'fully offline' claim is inaccurate given these network operations.

ℹ Credentials

The skill declares no required credentials or sensitive environment variables, which is appropriate. It does support optional OPENCLAW_EDGE_TTS_* environment variables for voice configuration. No secrets are requested in the manifest. However, scripts expect to read/write to /root/.openclaw paths in several places, which implicitly assumes elevated privileges or a root install; that is disproportionate to a user-space skill and could lead to accidental writes to root-owned locations.

ℹ Persistence & Privilege

The skill does not set always:true and does not request elevated platform privileges. It does instruct installing files to a local directory and creating a venv/config under the install directory. However, hard-coded references to /root paths and the suggested re-run after OpenClaw updates (with specific root-bound paths) give the skill an implicit assumption of installation under root; this is inconsistent with the install.sh default (HOME/.openclaw/tts). Autonomous invocation is allowed (platform default) and is expected for skills.

版本历史

v2.1.0

Full cleanup to Edge TTS outbound replies; refreshed reinstall flow and docs

v2.0.0

Rename from Lessac to Edge TTS; switch outbound replies to local Edge TTS; refresh install/reinstall docs and runtime wrappers

元数据

Slug edge-tts-voice-system

版本 2.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题