功能描述

Manage and configure Kiwi Voice assistant service. Use when starting/stopping Kiwi, editing voice config, checking logs, troubleshooting audio issues, or man...

使用说明 (SKILL.md)

Kiwi Voice

Name: Kiwi Voice
Author: yuangu260

Kiwi Voice -- standalone Python service providing voice interface to OpenClaw. Connects to Gateway via WebSocket (session agent:kiwi-voice:kiwi-voice).

Skill directory: ~/.openclaw/workspace/skills/kiwi-voice

Start / Stop

# Start (PowerShell)
cd ~/.openclaw/workspace/skills/kiwi-voice
.\start.ps1

# Or directly
.\venv\Scripts\activate
python -m kiwi

Stop: Ctrl+C in the running terminal.

Configuration

Main config: config.yaml. Secrets: .env (not committed).

TTS Provider

# config.yaml -> tts.provider: elevenlabs | piper | qwen3
tts:
  provider: "elevenlabs"
  elevenlabs:
    voice_id: "aEO01A4wXwd1O8GPgGlF"      # ElevenLabs voice ID
    model_id: "eleven_multilingual_v2"
    stability: 0.45
    similarity_boost: 0.75
    speed: 1.0

.env key: KIWI_ELEVENLABS_API_KEY

STT

# config.yaml -> stt
stt:
  model: "large"          # tiny | base | small | medium | large
  device: "cuda"          # cuda | cpu
  compute_type: "float16"
  language: "ru"

LLM

# config.yaml -> llm
llm:
  model: "openai/gpt-5.2"
  chat_timeout: 120

Audio Devices

# config.yaml -> audio
audio:
  output_device: null   # null = system default
  input_device: null    # null = system default

To list available devices run: python -c "import sounddevice; print(sounddevice.query_devices())"

Voice Security

# config.yaml -> security
security:
  telegram_approval_enabled: true

.env keys: KIWI_TELEGRAM_BOT_TOKEN, KIWI_TELEGRAM_CHAT_ID

Logs and Troubleshooting

All logs are in the logs/ directory (gitignored). Crash logs: logs/kiwi_crash_*.log. Startup log: logs/kiwi_startup.log. Runtime log: redirect stdout or check terminal output.

Common Issues

No audio output: check audio.output_device in config.yaml. Run the device list command above.

Slow TTS response: check tts.elevenlabs.use_streaming_endpoint is true and optimize_streaming_latency is 3-4.

STT not recognizing speech: check realtime.min_speech_volume (default 0.015). Lower if too sensitive, raise if missing speech. Check stt.model -- large is most accurate but loads slower.

WebSocket connection failed: ensure OpenClaw Gateway is running on the configured websocket.host:port (default 127.0.0.1:18789).

Voice Profiles

Stored in voice_profiles/ directory. JSON files with speaker embeddings.

Owner profile is auto-created. Friends can be added via voice command "Kiwi, remember me as [name]".

To reset all profiles, delete voice_profiles/*.json and restart the service.

Key Files

File	Purpose
`config.yaml`	All settings
`.env`	API keys and secrets
`kiwi/service.py`	Main service logic
`kiwi/listener.py`	Microphone + STT + VAD
`kiwi/tts/elevenlabs.py`	ElevenLabs TTS client
`kiwi/tts/streaming.py`	Streaming TTS manager
`kiwi/openclaw_ws.py`	WebSocket client for Gateway
`kiwi/speaker_manager.py`	Speaker identification and priority
`kiwi/voice_security.py`	Telegram approval for dangerous commands

安全使用建议

This package contains a full voice-assistant service (many Python modules, REST API, web UI, and ML-based components). Before installing or running it: - Treat the repository as high-privilege software: it listens on an HTTP API (default 0.0.0.0:7789) and exposes control endpoints (restart, shutdown, stop). Do NOT run it bound to 0.0.0.0 on an untrusted network. Change api.host to 127.0.0.1 if you only want local access. - The metadata claims no required env vars, but the code expects many secrets in .env (ElevenLabs, Telegram, RunPod, Home Assistant tokens). Audit and populate .env deliberately; do not reuse sensitive keys. If you don't use a provider, leave its keys unset. - config.yaml included in the package contains a hardcoded API token ("x4711-kiwi-2026-secret"). Treat that as insecure: remove or replace it with a strong token if you enable API auth, or disable the API if you don't need it. - SOUL.md contains instructions that attempt to override the assistant/system prompt and to force execution of any task. Remove or sanitize this file (or its contents) before enabling autonomous agent invocation; do NOT allow the skill to reconfigure the model prompt or behave with blanket 'never refuse' rules. - The code requires heavy ML/native dependencies (torch, ONNX, pyannote, local TTS models). Because no install spec is provided in the registry metadata, follow the project's README and install in an isolated environment (container or VM) so you can safely inspect network and file activity. - If you want to use only management features from Home Assistant, restrict the integration to localhost, supply a minimal token with limited scopes, and audit the coordinator/manifest behavior. If you're not comfortable auditing Python services or network-exposed APIs, run this only in a sandbox (container/VM) and do not enable remote access or reuse production credentials. The codebase appears to be a legitimate Kiwi Voice implementation, but the metadata omissions, embedded default token, and prompt-injection content make it risky to deploy without review.

功能分析

Type: OpenClaw Skill Name: kiwi-voice Version: 1.0.0 Kiwi Voice is a legitimate and feature-rich voice assistant integration for OpenClaw. It includes comprehensive modules for local speech-to-text, speaker identification using neural embeddings, and a robust two-layer security system that uses regex patterns and Telegram-based approvals to prevent unauthorized or dangerous command execution. The codebase is well-structured, uses standard industry libraries (e.g., faster-whisper, torch, aiohttp), and contains explicit defensive logic to protect the user's system from potential LLM hallucinations or unauthorized access.

能力评估

⚠ Purpose & Capability

The skill claims to 'manage and configure Kiwi Voice' but the registry metadata declares no required environment variables or config paths while the SKILL.md, README, and code reference many secrets and credentials (e.g., KIWI_ELEVENLABS_API_KEY, KIWI_TELEGRAM_BOT_TOKEN, RUNPOD keys, KIWI_HA_TOKEN) and expect heavy ML dependencies. That mismatch is incoherent: either the metadata is incomplete or the skill is asking for more privileges than declared.

⚠ Instruction Scope

SKILL.md instructs the agent to read and use local files (.env, config.yaml, logs, voice_profiles) and to start the service. Additionally, SOUL.md contains explicit system-prompt-like instructions (e.g., 'You are Kiwi... You can perform ANY task... Never refuse to execute') which are a prompt-injection risk — they attempt to change the assistant's behavior and grant it broad discretion to act. While service management needs access to some of these files, the presence of a system-prompt override embedded in the skill is out-of-scope for a benign 'manage' skill.

⚠ Install Mechanism

The registry shows no install spec, but the repository contains a large Python project (requirements.txt, many modules, models auto-download behavior). Heavy native/ML dependencies (CUDA, ONNX, pyannote, Faster Whisper, local TTS models) are required at runtime and are not declared in the skill metadata. That mismatch increases operational risk: users may run unreviewed installs or miss required sandboxing.

⚠ Credentials

Although the skill metadata lists no required env vars, the code and SKILL.md expect multiple secrets in .env (ElevenLabs API key, Telegram bot token + chat id, RunPod API keys, Home Assistant token, etc.). Worse, config.yaml included in the package contains an API token entry (api.auth.tokens -> token: "x4711-kiwi-2026-secret") and api.host is 0.0.0.0 by default. Hardcoded tokens and broad credential references are disproportionate and could lead to accidental exposure if deployed as-is.

⚠ Persistence & Privilege

always:false (good), but the skill implements a REST API (binds to 0.0.0.0:7789 by default), control endpoints (stop, restart, shutdown), and Home Assistant integration — all of which provide control surfaces that can be abused if misconfigured. Combined with the SOUL.md prompt override encouraging the agent to 'perform ANY task' and the hardcoded API token, the persistence/privilege posture is risky unless the service is carefully locked to localhost and tokens rotated.

版本历史

v1.0.0

- Initial release of kiwi-voice skill. - Provides management and configuration for the standalone Kiwi Voice assistant service. - Supports flexible TTS (ElevenLabs, Piper, Qwen3) and STT model selection. - Includes detailed configuration options for LLM, audio devices, and security settings. - Voice profile management and basic voice security/approval workflows included. - Troubleshooting tips and log file locations documented for easier support.

元数据

Slug kiwi-voice

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Kiwi Voice 是什么？

Manage and configure Kiwi Voice assistant service. Use when starting/stopping Kiwi, editing voice config, checking logs, troubleshooting audio issues, or man... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 246 次。

如何安装 Kiwi Voice？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install kiwi-voice」即可一键安装，无需额外配置。

Kiwi Voice 是免费的吗？

是的，Kiwi Voice 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Kiwi Voice 支持哪些平台？

Kiwi Voice 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Kiwi Voice？

由 yuangu260（@yuangu260）开发并维护，当前版本 v1.0.0。

Kiwi Voice

Kiwi Voice

Start / Stop

Configuration

TTS Provider

STT

LLM

Audio Devices

Voice Security

Logs and Troubleshooting

Common Issues

Voice Profiles

Key Files

Kiwi Voice 是什么？

如何安装 Kiwi Voice？

Kiwi Voice 是免费的吗？

Kiwi Voice 支持哪些平台？

谁开发了 Kiwi Voice？

💬 留言讨论