功能描述

Give your agent a voice. Use when the user wants the agent to speak, read aloud, or have voice responses.

使用说明 (SKILL.md)

Her Voice 🎙️

Name: Her Voice
Author: matusvojtek

Give your agent a voice. Audio responses powered by Kokoro TTS — a compact, naturally expressive model running entirely on-device.

✨ Features

Highly optimized response time thanks to on-the-fly audio streaming technology. 100% free, no API keys required. Inspired by Samantha and Sky.

⚡ On-the-fly Streaming — Audio plays as it generates, very low latency
👄 The Voice of an angel — Cutting-edge local text-to-speech model Kokoro TTS
🧠 TTS Daemon — Keep the model warm in RAM for instant responses (can be disabled to save RAM)
🖥️ Persist Mode — Drag & drop audio, paste text, use as a voice station
🔧 Fully Configurable — Voice, speed, visualizer, notification sounds
🍎 MLX + PyTorch — Native Metal acceleration on Apple Silicon, PyTorch fallback everywhere else
🎨 Real-time Visualizer — Floating 60fps LED bars that react to speech (macOS only)

First-Run Setup

python3 SKILL_DIR/scripts/setup.py

Note: SKILL_DIR is the root directory of this skill — the agent resolves it automatically when running commands.

The setup wizard will:

Detect platform and select TTS engine (MLX on Apple Silicon, PyTorch elsewhere)
Find or install the appropriate TTS backend (mlx-audio or kokoro)
Install espeak-ng (Homebrew on macOS, apt on Linux)
Patch espeak loader if needed (macOS compatibility)
Compile the native visualizer binary (macOS only)
Download the Kokoro model
Create config at ~/.her-voice/config.json

Check status anytime:

python3 SKILL_DIR/scripts/setup.py status

Post-Setup: Names & Pronunciation

After setup, configure the agent and user names:

python3 SKILL_DIR/scripts/config.py set agent_name "Jackie"
python3 SKILL_DIR/scripts/config.py set user_name "Matúš"
python3 SKILL_DIR/scripts/config.py set user_name_tts "Mah-toosh"

TTS pronunciation tip: If the user's name is non-English, figure out a phonetic English spelling that Kokoro will pronounce correctly. Store it in user_name_tts and use that spelling whenever speaking the name aloud. The real name stays in user_name for display purposes.

Speaking Text

# Basic usage
python3 SKILL_DIR/scripts/speak.py "Hello, world!"

# Skip visualizer for this call
python3 SKILL_DIR/scripts/speak.py --no-viz "Quick note"

# Save to file instead of playing
python3 SKILL_DIR/scripts/speak.py --save /tmp/output.wav "Save this"

# Override voice or speed
python3 SKILL_DIR/scripts/speak.py --voice af_bella --speed 1.2 "Faster!"

# Pipe text from stdin
echo "Piped text" | python3 SKILL_DIR/scripts/speak.py

Options

Flag	Description
`--no-viz`	Skip the visualizer for this call
`--persist`	Keep visualizer open after playback ends
`--save PATH`	Save audio to WAV file instead of playing
`--voice NAME`	Override the configured voice
`--speed N`	Override the configured speed multiplier
`--mode MODE`	Override visualizer mode (`v2` or `classic`)

Agent Workflow

When the user wants voice responses:

Check voice mode — is voice enabled or did the user ask for it?
Play notification sound (instant feedback while TTS generates):
```
afplay /System/Library/Sounds/Blow.aiff &
```

Speak the response:

python3 SKILL_DIR/scripts/speak.py "Response text here"

Always provide text alongside voice — accessibility matters.

Notification Sound

The notification sound plays instantly (~0.1s) while TTS generates (~0.3-3s). This gives the user immediate feedback that the agent is responding.

Configure in ~/.her-voice/config.json:

{
  "notification_sound": {
    "enabled": true,
    "sound": "Blow"
  }
}

Available macOS sounds: Blow, Bottle, Frog, Funk, Glass, Hero, Morse, Ping, Pop, Purr, Sosumi, Submarine, Tink. Located in /System/Library/Sounds/.

TTS Daemon

The daemon keeps the Kokoro model warm in RAM, eliminating ~1.1s of startup overhead per call.

The daemon auto-resolves the mlx-audio venv — no need to find the venv Python manually.

# Start (persists in background)
nohup python3 SKILL_DIR/scripts/daemon.py start > /tmp/her-voice-daemon.log 2>&1 & disown

# Status
python3 SKILL_DIR/scripts/daemon.py status

# Stop
python3 SKILL_DIR/scripts/daemon.py stop

# Restart
python3 SKILL_DIR/scripts/daemon.py restart

speak.py auto-detects the daemon: uses it if available, falls back to direct model loading.

The daemon is optional. Without it, speech still works — just ~1s slower per call as the model loads each time. Skip the daemon to save ~2.3GB RAM.

Note: The daemon writes its PID file and socket after the model is fully loaded and ready to accept connections. They live in ~/.her-voice/ with restricted permissions (owner-only access). The daemon won't survive a reboot — start it again after restart if needed.

Visualizer

A floating overlay with three animated LED bars that react to speech in real-time. 60fps, native macOS (Cocoa + AVFoundation). macOS only — on other platforms, audio plays without the visualizer.

Modes

v2 (default) — Three-tier pure red, center raw amplitude, sides with lag
classic — Original smooth gradient look

Controls

Key	Action
ESC	Quit
Space	Pause/Resume (file mode)
← →	Seek ±5s (file mode)
⌘V	Paste text to speak (persist mode)

Persist Mode

Keep the visualizer on screen between playbacks. Use as a standalone voice station:

# Launch in persist mode (stays open, idle breathing animation)
~/.her-voice/bin/her-voice-viz --persist

# Stream mode + persist (stays open after speech ends)
python3 SKILL_DIR/scripts/speak.py --persist "Hello!"

In persist mode:

Drag & drop audio files (.wav, .mp3, .aiff, .m4a) onto the visualizer to play them
⌘V pastes clipboard text → streams directly from TTS daemon with full visualizer animation
Idle breathing — subtle center bar pulse when waiting for input

Standalone Usage

# Play a file with visualizer
~/.her-voice/bin/her-voice-viz --audio /path/to/file.wav

# Demo mode (simulated audio)
~/.her-voice/bin/her-voice-viz --demo

# Stream raw PCM
cat audio.raw | ~/.her-voice/bin/her-voice-viz --stream --sample-rate 24000

Disable Visualizer

python3 SKILL_DIR/scripts/config.py set visualizer.enabled false

Configuration

Config file: ~/.her-voice/config.json

# View all settings
python3 SKILL_DIR/scripts/config.py status

# Get a value
python3 SKILL_DIR/scripts/config.py get voice

# Set a value (dot notation for nested keys)
python3 SKILL_DIR/scripts/config.py set speed 1.1
python3 SKILL_DIR/scripts/config.py set visualizer.mode classic

Key Settings

Key	Default	Description
`agent_name`	`""`	Agent's name (e.g. "Jackie")
`user_name`	`""`	User's real name
`user_name_tts`	`""`	Phonetic spelling for TTS (e.g. "Mah-toosh" for Matúš)
`voice`	`af_heart`	Base voice name
`voice_blend`	`{af_heart: 0.6, af_sky: 0.4}`	Voice blend weights
`speed`	`1.05`	Speech speed multiplier
`language`	`en`	Language code
`tts_engine`	`auto`	TTS engine: `auto`, `mlx`, or `pytorch`
`model`	`mlx-community/Kokoro-82M-bf16`	Model identifier (MLX)
`visualizer.enabled`	`true`	Show visualizer overlay
`visualizer.mode`	`v2`	Animation mode (v2/classic)
`visualizer.remember_position`	`true`	Save window position between sessions
`notification_sound.enabled`	`true`	Play sound before speaking
`notification_sound.sound`	`Blow`	macOS system sound name
`daemon.auto_start`	`true`	Advisory flag only — the daemon never self-starts. When `true`, the agent should start it on first voice use (saves ~1s/call, costs ~2.3GB RAM)
`daemon.socket_path`	`~/.her-voice/tts.sock`	Unix socket path

Voice Selection

Voice Blending

Mix multiple voices for a unique sound. Configure voice_blend in config:

{
  "voice_blend": {"af_heart": 0.6, "af_sky": 0.4}
}

The blended voice is stored as a .safetensors file in the model's voices directory (e.g., af_heart_60_af_sky_40.safetensors). Create it by running TTS once — speak.py looks for the pre-blended file automatically.

Error Handling

Error	Cause	Fix
"mlx-audio not found"	Venv missing or broken	Run `setup.py`
"espeak-ng not found"	Phonemizer missing	`brew install espeak-ng`
Compilation failed	Xcode tools missing	`xcode-select --install`
"Model not found"	First run, no download	Run `setup.py` or speak once
Daemon "not running"	Crashed or rebooted	Start daemon again
No sound output	macOS audio permissions	Check System Settings → Sound → Output
Visualizer not showing	Binary not compiled	Run `setup.py`
"kokoro not found"	PyTorch venv missing	Run `setup.py`
PyTorch CUDA error	GPU driver mismatch	`pip install torch --force-reinstall` in kokoro venv
"soundfile not found"	Missing dependency	`pip install soundfile` in kokoro venv

Requirements

macOS + Apple Silicon recommended for best experience (MLX engine + visualizer + notification sounds)
Linux/Intel Mac supported via PyTorch Kokoro engine (no visualizer)
Windows is not supported
Xcode Command Line Tools for visualizer on macOS (xcode-select --install)
espeak-ng for phonemization (brew install espeak-ng on macOS, apt install espeak-ng on Linux)
~500MB disk (model + venv)
~2.3GB RAM when daemon is running

Uninstall

Remove all Her Voice data (config, venvs, compiled binary, daemon state):

python3 SKILL_DIR/scripts/daemon.py stop
rm -rf ~/.her-voice

How It Works

Kokoro 82M — A compact neural TTS model with two backends: MLX (Apple's framework for native Metal GPU acceleration on Apple Silicon) and PyTorch (works everywhere). The engine is auto-detected based on platform, or can be forced via the tts_engine config option (auto, mlx, or pytorch)
Streaming — Audio generates and plays simultaneously. First sound in ~0.3s (with daemon) vs ~3s batch
Visualizer — Native macOS app (Swift/Cocoa) reads raw PCM from stdin, plays via AVAudioEngine with real-time amplitude metering
Daemon — Unix socket server holding the model in RAM. Eliminates Python import + model load overhead on every call

安全使用建议

What to consider before installing: - This skill installs Python virtual environments and pip packages (mlx-audio, kokoro, numpy, etc.) and may download a TTS model (large). Ensure you have disk space and are comfortable allowing those network downloads. - On macOS it will attempt to compile a Swift visualizer and may run Homebrew to install espeak-ng; follow prompts and grant only the actions you trust. - The setup may patch a third‑party module inside the created venv to fix library loading (macOS only). The change targets the venv, not system packages, but you may want to inspect the patch before allowing it. - The optional daemon listens on a UNIX socket under ~/.her-voice with restrictive permissions (0600). Any local process with access to your user account could connect; the socket is not network-exposed. - No API keys or external endpoints are hardcoded; however pip/model downloads require internet access. If you need to be cautious, run setup in an isolated environment (VM/container) or review the scripts first. - If you are not comfortable with code that installs packages, writes to your home directory, compiles binaries, or modifies venv-installed modules, do not install or audit the code before running.

功能分析

Type: OpenClaw Skill Name: her-voice Version: 1.0.2 The OpenClaw skill bundle (version 1.0.2) is classified as benign. The code and documentation are transparent and align with the stated purpose of providing a local text-to-speech agent. Crucially, the `CHANGELOG.md` details numerous critical security fixes implemented in this version, addressing potential vulnerabilities such as local symlink attacks, buffer overflows in the Swift visualizer (via `strlcpy`), resource exhaustion, and path injection. The Python scripts (`config.py`, `daemon.py`, `speak.py`) and the Swift visualizer (`HerVoice.swift`) incorporate these fixes, demonstrating robust input validation, secure file handling (e.g., restrictive permissions, symlink checks), and safe execution practices. There is no evidence of intentional malicious behavior like data exfiltration or unauthorized remote control.

能力评估

✓ Purpose & Capability

Name/description (local TTS and visualizer) align with the included files and runtime steps. The package reasonably needs Python, espeak-ng, venvs, optional Swift compilation, and model files; those are present and used for Kokoro/MLX TTS and the macOS visualizer.

ℹ Instruction Scope

The SKILL.md and scripts direct the agent to run setup.py, start/stop a local daemon, run speak.py, and optionally compile/run a Swift visualizer. These instructions are within the stated TTS scope, but the setup step will install packages, create venvs, download models, and — on macOS — patch a third‑party module in the venv to fix library loading. Those actions are functional for the feature but worth the user's awareness.

ℹ Install Mechanism

There is no registry 'install' spec, but setup.py will create virtual environments and invoke pip to install mlx-audio/kokoro and dependencies, and on macOS will call swiftc to compile the visualizer and may run Homebrew. This involves network downloads from PyPI and model sources (expected for TTS), which is moderate risk but proportionate to the feature.

✓ Credentials

The skill requests no environment variables or external API keys. Configuration and credentials are stored under ~/.her-voice. The declared binaries (python3, espeak-ng) match the functionality and no unrelated credentials or system paths are requested.

✓ Persistence & Privilege

The daemon is optional (not always:true). It creates confined files in ~/.her-voice, a UNIX socket and PID file with restrictive permissions (0600), and does not modify other skills or system-wide agent settings. Running a background daemon is expected for low-latency TTS.

版本历史

v1.0.2

**Improved daemon security, config clarity, and doc hints.** - TTS daemon files (PID, socket) are now stored in `~/.her-voice/` with owner-only permissions for better security. - Added documentation tip: agent now automatically resolves `SKILL_DIR` when running commands. - Improved configuration instructions for clarity and accuracy. - Updated changelog and documentation to match these behavioral updates. - No breaking changes; functionality remains the same.

v1.0.1

- Rewrote and condensed documentation in SKILL.md for improved clarity and focus. - Simplified the description to highlight key use cases and features. - Core functionality and configuration details remain unchanged. - No new features or breaking changes introduced in this version.

v1.0.0

Her Voice 1.0.0 - Introduces Kokoro TTS for fast, natural on-device voice responses with no cloud, API keys, or subscriptions. - Features on-the-fly audio streaming for minimal latency and instant feedback notification sounds. - Offers optional real-time audio visualizer overlay (macOS only) with persist and standalone modes. - Fully configurable: voices, speeds, pronunciation, visualizer, and notification sounds. - Includes a RAM-resident TTS daemon for instant startup, with easy CLI and config management. - Simple setup script auto-detects best TTS backend and compiles native components as needed.

元数据

Slug her-voice

版本 1.0.2

许可证 —

累计安装 0

当前安装数 0

历史版本数 3

常见问题

Her Voice 是什么？

Give your agent a voice. Use when the user wants the agent to speak, read aloud, or have voice responses. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 707 次。

如何安装 Her Voice？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install her-voice」即可一键安装，无需额外配置。

Her Voice 是免费的吗？

是的，Her Voice 完全免费（开源免费），可自由下载、安装和使用。

Her Voice 支持哪些平台？

Her Voice 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Her Voice？

由 matusvojtek（@matusvojtek）开发并维护，当前版本 v1.0.2。

Her Voice