Description

Manage and configure Kiwi Voice assistant service. Use when starting/stopping Kiwi, editing voice config, checking logs, troubleshooting audio issues, or man...

README (SKILL.md)

Kiwi Voice

Name: Kiwi Voice
Author: yuangu260

Kiwi Voice -- standalone Python service providing voice interface to OpenClaw. Connects to Gateway via WebSocket (session agent:kiwi-voice:kiwi-voice).

Skill directory: ~/.openclaw/workspace/skills/kiwi-voice

Start / Stop

# Start (PowerShell)
cd ~/.openclaw/workspace/skills/kiwi-voice
.\start.ps1

# Or directly
.\venv\Scripts\activate
python -m kiwi

Stop: Ctrl+C in the running terminal.

Configuration

Main config: config.yaml. Secrets: .env (not committed).

TTS Provider

# config.yaml -> tts.provider: elevenlabs | piper | qwen3
tts:
  provider: "elevenlabs"
  elevenlabs:
    voice_id: "aEO01A4wXwd1O8GPgGlF"      # ElevenLabs voice ID
    model_id: "eleven_multilingual_v2"
    stability: 0.45
    similarity_boost: 0.75
    speed: 1.0

.env key: KIWI_ELEVENLABS_API_KEY

STT

# config.yaml -> stt
stt:
  model: "large"          # tiny | base | small | medium | large
  device: "cuda"          # cuda | cpu
  compute_type: "float16"
  language: "ru"

LLM

# config.yaml -> llm
llm:
  model: "openai/gpt-5.2"
  chat_timeout: 120

Audio Devices

# config.yaml -> audio
audio:
  output_device: null   # null = system default
  input_device: null    # null = system default

To list available devices run: python -c "import sounddevice; print(sounddevice.query_devices())"

Voice Security

# config.yaml -> security
security:
  telegram_approval_enabled: true

.env keys: KIWI_TELEGRAM_BOT_TOKEN, KIWI_TELEGRAM_CHAT_ID

Logs and Troubleshooting

All logs are in the logs/ directory (gitignored). Crash logs: logs/kiwi_crash_*.log. Startup log: logs/kiwi_startup.log. Runtime log: redirect stdout or check terminal output.

Common Issues

No audio output: check audio.output_device in config.yaml. Run the device list command above.

Slow TTS response: check tts.elevenlabs.use_streaming_endpoint is true and optimize_streaming_latency is 3-4.

STT not recognizing speech: check realtime.min_speech_volume (default 0.015). Lower if too sensitive, raise if missing speech. Check stt.model -- large is most accurate but loads slower.

WebSocket connection failed: ensure OpenClaw Gateway is running on the configured websocket.host:port (default 127.0.0.1:18789).

Voice Profiles

Stored in voice_profiles/ directory. JSON files with speaker embeddings.

Owner profile is auto-created. Friends can be added via voice command "Kiwi, remember me as [name]".

To reset all profiles, delete voice_profiles/*.json and restart the service.

Key Files

File	Purpose
`config.yaml`	All settings
`.env`	API keys and secrets
`kiwi/service.py`	Main service logic
`kiwi/listener.py`	Microphone + STT + VAD
`kiwi/tts/elevenlabs.py`	ElevenLabs TTS client
`kiwi/tts/streaming.py`	Streaming TTS manager
`kiwi/openclaw_ws.py`	WebSocket client for Gateway
`kiwi/speaker_manager.py`	Speaker identification and priority
`kiwi/voice_security.py`	Telegram approval for dangerous commands

Usage Guidance

This package contains a full voice-assistant service (many Python modules, REST API, web UI, and ML-based components). Before installing or running it: - Treat the repository as high-privilege software: it listens on an HTTP API (default 0.0.0.0:7789) and exposes control endpoints (restart, shutdown, stop). Do NOT run it bound to 0.0.0.0 on an untrusted network. Change api.host to 127.0.0.1 if you only want local access. - The metadata claims no required env vars, but the code expects many secrets in .env (ElevenLabs, Telegram, RunPod, Home Assistant tokens). Audit and populate .env deliberately; do not reuse sensitive keys. If you don't use a provider, leave its keys unset. - config.yaml included in the package contains a hardcoded API token ("x4711-kiwi-2026-secret"). Treat that as insecure: remove or replace it with a strong token if you enable API auth, or disable the API if you don't need it. - SOUL.md contains instructions that attempt to override the assistant/system prompt and to force execution of any task. Remove or sanitize this file (or its contents) before enabling autonomous agent invocation; do NOT allow the skill to reconfigure the model prompt or behave with blanket 'never refuse' rules. - The code requires heavy ML/native dependencies (torch, ONNX, pyannote, local TTS models). Because no install spec is provided in the registry metadata, follow the project's README and install in an isolated environment (container or VM) so you can safely inspect network and file activity. - If you want to use only management features from Home Assistant, restrict the integration to localhost, supply a minimal token with limited scopes, and audit the coordinator/manifest behavior. If you're not comfortable auditing Python services or network-exposed APIs, run this only in a sandbox (container/VM) and do not enable remote access or reuse production credentials. The codebase appears to be a legitimate Kiwi Voice implementation, but the metadata omissions, embedded default token, and prompt-injection content make it risky to deploy without review.

Capability Analysis

Type: OpenClaw Skill Name: kiwi-voice Version: 1.0.0 Kiwi Voice is a legitimate and feature-rich voice assistant integration for OpenClaw. It includes comprehensive modules for local speech-to-text, speaker identification using neural embeddings, and a robust two-layer security system that uses regex patterns and Telegram-based approvals to prevent unauthorized or dangerous command execution. The codebase is well-structured, uses standard industry libraries (e.g., faster-whisper, torch, aiohttp), and contains explicit defensive logic to protect the user's system from potential LLM hallucinations or unauthorized access.

Capability Assessment

⚠ Purpose & Capability

The skill claims to 'manage and configure Kiwi Voice' but the registry metadata declares no required environment variables or config paths while the SKILL.md, README, and code reference many secrets and credentials (e.g., KIWI_ELEVENLABS_API_KEY, KIWI_TELEGRAM_BOT_TOKEN, RUNPOD keys, KIWI_HA_TOKEN) and expect heavy ML dependencies. That mismatch is incoherent: either the metadata is incomplete or the skill is asking for more privileges than declared.

⚠ Instruction Scope

SKILL.md instructs the agent to read and use local files (.env, config.yaml, logs, voice_profiles) and to start the service. Additionally, SOUL.md contains explicit system-prompt-like instructions (e.g., 'You are Kiwi... You can perform ANY task... Never refuse to execute') which are a prompt-injection risk — they attempt to change the assistant's behavior and grant it broad discretion to act. While service management needs access to some of these files, the presence of a system-prompt override embedded in the skill is out-of-scope for a benign 'manage' skill.

⚠ Install Mechanism

The registry shows no install spec, but the repository contains a large Python project (requirements.txt, many modules, models auto-download behavior). Heavy native/ML dependencies (CUDA, ONNX, pyannote, Faster Whisper, local TTS models) are required at runtime and are not declared in the skill metadata. That mismatch increases operational risk: users may run unreviewed installs or miss required sandboxing.

⚠ Credentials

Although the skill metadata lists no required env vars, the code and SKILL.md expect multiple secrets in .env (ElevenLabs API key, Telegram bot token + chat id, RunPod API keys, Home Assistant token, etc.). Worse, config.yaml included in the package contains an API token entry (api.auth.tokens -> token: "x4711-kiwi-2026-secret") and api.host is 0.0.0.0 by default. Hardcoded tokens and broad credential references are disproportionate and could lead to accidental exposure if deployed as-is.

⚠ Persistence & Privilege

always:false (good), but the skill implements a REST API (binds to 0.0.0.0:7789 by default), control endpoints (stop, restart, shutdown), and Home Assistant integration — all of which provide control surfaces that can be abused if misconfigured. Combined with the SOUL.md prompt override encouraging the agent to 'perform ANY task' and the hardcoded API token, the persistence/privilege posture is risky unless the service is carefully locked to localhost and tokens rotated.

Version History

v1.0.0

- Initial release of kiwi-voice skill. - Provides management and configuration for the standalone Kiwi Voice assistant service. - Supports flexible TTS (ElevenLabs, Piper, Qwen3) and STT model selection. - Includes detailed configuration options for LLM, audio devices, and security settings. - Voice profile management and basic voice security/approval workflows included. - Troubleshooting tips and log file locations documented for easier support.

Metadata

Slug kiwi-voice

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Kiwi Voice?

Manage and configure Kiwi Voice assistant service. Use when starting/stopping Kiwi, editing voice config, checking logs, troubleshooting audio issues, or man... It is an AI Agent Skill for Claude Code / OpenClaw, with 246 downloads so far.

How do I install Kiwi Voice?

Run "/install kiwi-voice" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Kiwi Voice free?

Yes, Kiwi Voice is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Kiwi Voice support?

Kiwi Voice is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Kiwi Voice?

It is built and maintained by yuangu260 (@yuangu260); the current version is v1.0.0.

More Skills

Kiwi Voice

Kiwi Voice

Start / Stop

Configuration

TTS Provider

STT

LLM

Audio Devices

Voice Security

Logs and Troubleshooting

Common Issues

Voice Profiles

Key Files

What is Kiwi Voice?

How do I install Kiwi Voice?

Is Kiwi Voice free?

Which platforms does Kiwi Voice support?

Who created Kiwi Voice?

💬 Comments