← Back to Skills Marketplace
lakshibro

KittenTTS WhatsApp

by ReadY · GitHub ↗ · v1.0.4 · MIT-0
cross-platform ⚠ suspicious
126
Downloads
0
Stars
0
Active Installs
5
Versions
Install in OpenClaw
/install kittentts-whatsapp
Description
Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib...
README (SKILL.md)

KittenTTS WhatsApp Voice

Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.

⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.

System Dependencies

Dependency Install command Size Notes
ffmpeg apt-get install -y ffmpeg ~30MB Available in most distro repos
kittentts pip3 install kittentts --break-system-packages pulls ~25-80MB from Hugging Face on first run Python package
libopus bundled with ffmpeg OGG encoding support
soundfile pulled by kittentts Python package

Network Calls

  • First run: downloads TTS model (~25-80MB) from huggingface.co/KittenML based on model size chosen
  • No API keys required — fully offline capable after model download
  • Set HF_TOKEN env var to avoid unauthenticated rate limits on model download

Model Options

Model Parameters Size Hugging Face ID
nano (int8) 15M 25MB KittenML/kitten-tts-nano-0.8-int8
nano 15M 56MB KittenML/kitten-tts-nano-0.8-fp32
micro 40M 41MB KittenML/kitten-tts-micro-0.8
mini 80M 80MB KittenML/kitten-tts-mini-0.8

Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.

Setup

Run these manually before the skill is used:

# 1. System package (requires root/privileged)
apt-get install -y ffmpeg

# 2. Python package
pip3 install kittentts --break-system-packages

# 3. Optional: set Hugging Face token to avoid rate limits
# echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.bashrc

Restart OpenClaw after installing dependencies so the new packages are in PATH.

Usage

TTS only (no transcription)

bash scripts/tts_walkie.sh "Your message here" Bella
# Output: /tmp/walkie_reply.ogg (16kHz OGG Opus, WhatsApp-ready)

Transcription only (optional — requires whisper)

# Install whisper (one-time, ~140MB-1.4GB depending on model)
pip3 install whisper --break-system-packages

bash scripts/transcribe.sh /path/to/audio.ogg [model]
# Model: tiny | base | small | medium | large (default: base)

Voices

Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Default: Bella

Security Notes

  • Audio files are written to a private /tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.
  • WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
  • Set VOICE_SPEED env var to adjust speech rate (default: 1.0).

Files

kittentts-whatsapp/
├── SKILL.md
└── scripts/
    ├── tts_walkie.sh      # TTS + ffmpeg conversion (speed is now used)
    └── transcribe.sh       # whisper transcription (optional)

⚠️ Privileged Install Warning

The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.

Troubleshooting

Audio sends but is silent or rejected by WhatsApp: → Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg → Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.

TTS generation is slow: → Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.

Hugging Face download rate limit: → Set HF_TOKEN in your environment. Free accounts get lower rate limits.

Usage Guidance
This skill appears to do exactly what it says (generate WhatsApp-ready voice notes and optionally transcribe audio). Before installing, consider: 1) Do not run the provided apt-get / pip commands on a managed or production machine without approval — pip --break-system-packages can change system Python packages. Prefer using a virtualenv, container, or dedicated machine. 2) The model download comes from huggingface.co (~25–80MB); set HF_TOKEN only if you trust where you store the token (adding it to ~/.bashrc stores it in plaintext). 3) Verify ffmpeg and Python dependencies yourself and inspect the two scripts (they are short and straightforward). 4) The registry metadata and SKILL.md metadata disagree about required binaries — treat SKILL.md as the authoritative source. If you need lower risk, run this inside a disposable VM/container.
Capability Analysis
Type: OpenClaw Skill Name: kittentts-whatsapp Version: 1.0.4 The skill contains critical injection vulnerabilities in scripts/tts_walkie.sh and scripts/transcribe.sh, where user-controlled bash variables are directly embedded into Python heredocs (e.g., text = """$TEXT"""). This allows an attacker to execute arbitrary Python code by including triple quotes in the input. While the stated purpose of generating WhatsApp-compatible audio is plausible and no evidence of intentional data exfiltration or backdoors was found, the unsafe code construction and requirement for privileged system-wide installation pose a significant security risk.
Capability Assessment
Purpose & Capability
The skill's name/description (KittenTTS → WhatsApp OGG) align with the included scripts and instructions (tts_walkie.sh uses KittenTTS and ffmpeg; transcribe.sh uses whisper + ffmpeg). Minor inconsistency: registry metadata listed 'Required env vars: none' and no required binaries, while SKILL.md metadata declares ffmpeg, network access to huggingface.co, and 'privileged: true'. This appears to be an authoring/metadata mismatch, not malicious behavior.
Instruction Scope
Runtime instructions and the two scripts stick to audio generation/transcription and temporary file handling. Scripts create a private /tmp directory, write WAV/OGG files, call ffmpeg, whisper, and KittenTTS; they do not access unrelated system files or send data to external endpoints beyond downloading models from Hugging Face. Note: SKILL.md suggests adding HF_TOKEN to ~/.bashrc (writes a token into shell config) — this is a user-level change you should consider before applying.
Install Mechanism
There is no automated install spec; the docs ask you to run apt-get and pip3 install manually. That is expected for this use case but is intrusive: pip3 install kittentts --break-system-packages and apt-get install -y ffmpeg require root and can alter system Python packages on managed machines. Model downloads (~25–80MB) come from huggingface.co (a known host).
Credentials
The skill does not require unrelated secrets. HF_TOKEN is optional and only suggested to reduce download rate limits; no other credentials or tokens are requested. The scripts do not read other environment variables beyond VOICE_SPEED (documented) and the optional HF_TOKEN.
Persistence & Privilege
The skill is not marked always:true and does not modify other skills or system-wide agent settings. It requires privileged actions only for dependency installation (apt/pip), which is documented in the README; otherwise it runs as the invoking user and stores temporary files under a mode-700 directory.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install kittentts-whatsapp
  3. After installation, invoke the skill by name or use /kittentts-whatsapp
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.4
v1.0.4: Fixed /tmp world-readable issue — audio now written to mode-700 /tmp/kittentts-walkie/. Fixed unused speed parameter bug — VOICE_SPEED now passed to tts.generate(). WAV intermediates cleaned up after conversion. Added Security Notes section.
v1.0.3
v1.0.3: Added openclaw.metadata block with requires.bins (ffmpeg), requires.packages (kittentts), requires.network (huggingface.co), requires.privileged (true), and requires.warning string — surfaces warnings at registry/install time not just in prose.
v1.0.2
Renamed skill from KittenTTS WhatsApp Walkie-Talkie to KittenTTS WhatsApp
v1.0.1
v1.0.1: Added full dependency table, network call disclosure (Hugging Face model download ~25-80MB), model size table, privileged install warnings, and troubleshooting section.
v1.0.0
Initial release: KittenTTS + ffmpeg chain for WhatsApp voice notes. Handles 24kHz WAV → 16kHz OGG Opus conversion that WhatsApp requires.
Metadata
Slug kittentts-whatsapp
Version 1.0.4
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 5
Frequently Asked Questions

What is KittenTTS WhatsApp?

Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib... It is an AI Agent Skill for Claude Code / OpenClaw, with 126 downloads so far.

How do I install KittenTTS WhatsApp?

Run "/install kittentts-whatsapp" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is KittenTTS WhatsApp free?

Yes, KittenTTS WhatsApp is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does KittenTTS WhatsApp support?

KittenTTS WhatsApp is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created KittenTTS WhatsApp?

It is built and maintained by ReadY (@lakshibro); the current version is v1.0.4.

💬 Comments