Kesha Voice Kit
/install kesha-voice-kit
kesha-voice-kit
Local voice toolkit: transcribe voice messages to text, synthesize speech, detect language of audio or text. Fully offline after kesha install. No API keys, no per-minute billing.
Trigger keywords for when to use this skill: voice message, voice memo, .ogg, .wav, .mp3, audio file, transcribe, transcription, speech-to-text, STT, text-to-speech, TTS, synthesize speech, say, multilingual voice, multilingual ASR, language detection, offline voice, privacy, Apple Silicon, CoreML.
When to use
- Voice memo arrived (Telegram, WhatsApp, Slack, Signal .ogg/.opus/.m4a): transcribe with
kesha --json \x3Cpath>and branch on the detected language. - Need to reply with audio: synthesize with
kesha say "\x3Ctext>" > reply.wav. Auto-routes by detected language (Kokoro-82M for English, Piper for Russian). For other languages and ~180 more voices use--voice macos-*on macOS (zero model download). - Need to detect what language a file is in before choosing a pipeline:
kesha --json audio.oggreturns both audio-based and text-based language detection with confidence scores.
STT: transcribe audio
# JSON output with language detection (recommended for automation)
kesha --json voice.ogg
[{
"file": "voice.ogg",
"text": "Привет, как дела?",
"lang": "ru",
"audioLanguage": { "code": "ru", "confidence": 0.98 },
"textLanguage": { "code": "ru", "confidence": 0.99 }
}]
Use lang (or the more detailed audioLanguage/textLanguage) to decide how to respond.
Formats: .ogg, .opus, .mp3, .m4a, .wav, .flac, .webm — decoded via symphonia, no ffmpeg required.
Other output modes:
kesha audio.ogg— plain transcript on stdoutkesha --format transcript audio.ogg— transcript +[lang: ru, confidence: 0.99]footerkesha --verbose audio.ogg— human-readable with language infokesha --lang en audio.ogg— warn if detected language differs (useful sanity check)
TTS: synthesize speech
kesha say "Hello, world" > hello.wav # auto-routes en → Kokoro-82M
kesha say "Привет, мир" > privet.wav # auto-routes ru → Piper
kesha say --voice macos-de-DE "Guten Tag" > de.wav # any macOS system voice — German, French, Italian, ...
kesha say --list-voices # Kokoro + Piper + ~180 macos-* voices
Output: WAV mono float32. --out \x3Cpath> writes to a file instead of stdout.
Language detection standalone
kesha --json audio.ogg includes both audio-based (audioLanguage) and text-based (textLanguage) detection. Use audio detection to identify the language before running language-specific logic.
Install
bun add --global @drakulavich/kesha-voice-kit # or: npm i -g @drakulavich/kesha-voice-kit
kesha install # downloads engine (~350 MB)
kesha install --tts # adds Kokoro + Piper RU + ONNX G2P (~490 MB more, for TTS)
No system deps — G2P runs as ONNX alongside Kokoro/Piper. macos-* voices need no install either — they use voices already on the Mac.
Supported languages
Speech-to-text (25): Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian.
Text-to-speech: English (Kokoro-82M, ~70 voices), Russian (Piper ru-denis), plus any macOS system voice via --voice macos-*.
Performance
- ASR: ~19× faster than OpenAI Whisper on Apple Silicon (CoreML via FluidAudio), ~2.5× on CPU (ONNX via
ort). - TTS: sub-second latency for short utterances on Apple Silicon.
Why local
No API keys to manage. No per-minute billing. Voice data never leaves the machine — important for regulated industries, personal messaging, and anything that shouldn't be in a third-party log.
Links
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install kesha-voice-kit - 安装完成后,直接呼叫该 Skill 的名称或使用
/kesha-voice-kit触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Kesha Voice Kit 是什么?
Offline voice toolkit for speech-to-text, text-to-speech, and language detection supporting 25 languages with no API keys or cloud usage. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 192 次。
如何安装 Kesha Voice Kit?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install kesha-voice-kit」即可一键安装,无需额外配置。
Kesha Voice Kit 是免费的吗?
是的,Kesha Voice Kit 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Kesha Voice Kit 支持哪些平台?
Kesha Voice Kit 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Kesha Voice Kit?
由 Anton Yakutovich(@drakulavich)开发并维护,当前版本 v1.4.4。