← Back to Skills Marketplace

KittenTTS WhatsApp

Name: KittenTTS WhatsApp
Author: lakshibro

by ReadY · GitHub ↗ · v1.0.4 · MIT-0

cross-platform ⚠ suspicious

126

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install kittentts-whatsapp

Description

Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib...

README (SKILL.md)

KittenTTS WhatsApp Voice

Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.

⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.

System Dependencies

Dependency	Install command	Size	Notes
`ffmpeg`	`apt-get install -y ffmpeg`	~30MB	Available in most distro repos
`kittentts`	`pip3 install kittentts --break-system-packages`	pulls ~25-80MB from Hugging Face on first run	Python package
`libopus`	bundled with ffmpeg	—	OGG encoding support
`soundfile`	pulled by kittentts	—	Python package

Network Calls

First run: downloads TTS model (~25-80MB) from huggingface.co/KittenML based on model size chosen
No API keys required — fully offline capable after model download
Set HF_TOKEN env var to avoid unauthenticated rate limits on model download

Model Options

Model	Parameters	Size	Hugging Face ID
nano (int8)	15M	25MB	`KittenML/kitten-tts-nano-0.8-int8`
nano	15M	56MB	`KittenML/kitten-tts-nano-0.8-fp32`
micro	40M	41MB	`KittenML/kitten-tts-micro-0.8`
mini	80M	80MB	`KittenML/kitten-tts-mini-0.8`

Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.

Setup

Run these manually before the skill is used:

# 1. System package (requires root/privileged)
apt-get install -y ffmpeg

# 2. Python package
pip3 install kittentts --break-system-packages

# 3. Optional: set Hugging Face token to avoid rate limits
# echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.bashrc

Restart OpenClaw after installing dependencies so the new packages are in PATH.

Usage

TTS only (no transcription)

bash scripts/tts_walkie.sh "Your message here" Bella
# Output: /tmp/walkie_reply.ogg (16kHz OGG Opus, WhatsApp-ready)

Transcription only (optional — requires whisper)

# Install whisper (one-time, ~140MB-1.4GB depending on model)
pip3 install whisper --break-system-packages

bash scripts/transcribe.sh /path/to/audio.ogg [model]
# Model: tiny | base | small | medium | large (default: base)

Voices

Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Default: Bella

Security Notes

Audio files are written to a private /tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.
WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
Set VOICE_SPEED env var to adjust speech rate (default: 1.0).

Files

kittentts-whatsapp/
├── SKILL.md
└── scripts/
    ├── tts_walkie.sh      # TTS + ffmpeg conversion (speed is now used)
    └── transcribe.sh       # whisper transcription (optional)

⚠️ Privileged Install Warning

The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.

Troubleshooting

Audio sends but is silent or rejected by WhatsApp: → Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg → Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.

TTS generation is slow: → Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.

Hugging Face download rate limit: → Set HF_TOKEN in your environment. Free accounts get lower rate limits.

Usage Guidance

This skill appears to do exactly what it says (generate WhatsApp-ready voice notes and optionally transcribe audio). Before installing, consider: 1) Do not run the provided apt-get / pip commands on a managed or production machine without approval — pip --break-system-packages can change system Python packages. Prefer using a virtualenv, container, or dedicated machine. 2) The model download comes from huggingface.co (~25–80MB); set HF_TOKEN only if you trust where you store the token (adding it to ~/.bashrc stores it in plaintext). 3) Verify ffmpeg and Python dependencies yourself and inspect the two scripts (they are short and straightforward). 4) The registry metadata and SKILL.md metadata disagree about required binaries — treat SKILL.md as the authoritative source. If you need lower risk, run this inside a disposable VM/container.

Capability Analysis

Type: OpenClaw Skill Name: kittentts-whatsapp Version: 1.0.4 The skill contains critical injection vulnerabilities in scripts/tts_walkie.sh and scripts/transcribe.sh, where user-controlled bash variables are directly embedded into Python heredocs (e.g., text = """$TEXT"""). This allows an attacker to execute arbitrary Python code by including triple quotes in the input. While the stated purpose of generating WhatsApp-compatible audio is plausible and no evidence of intentional data exfiltration or backdoors was found, the unsafe code construction and requirement for privileged system-wide installation pose a significant security risk.

Capability Assessment

ℹ Purpose & Capability

The skill's name/description (KittenTTS → WhatsApp OGG) align with the included scripts and instructions (tts_walkie.sh uses KittenTTS and ffmpeg; transcribe.sh uses whisper + ffmpeg). Minor inconsistency: registry metadata listed 'Required env vars: none' and no required binaries, while SKILL.md metadata declares ffmpeg, network access to huggingface.co, and 'privileged: true'. This appears to be an authoring/metadata mismatch, not malicious behavior.

✓ Instruction Scope

Runtime instructions and the two scripts stick to audio generation/transcription and temporary file handling. Scripts create a private /tmp directory, write WAV/OGG files, call ffmpeg, whisper, and KittenTTS; they do not access unrelated system files or send data to external endpoints beyond downloading models from Hugging Face. Note: SKILL.md suggests adding HF_TOKEN to ~/.bashrc (writes a token into shell config) — this is a user-level change you should consider before applying.

ℹ Install Mechanism

There is no automated install spec; the docs ask you to run apt-get and pip3 install manually. That is expected for this use case but is intrusive: pip3 install kittentts --break-system-packages and apt-get install -y ffmpeg require root and can alter system Python packages on managed machines. Model downloads (~25–80MB) come from huggingface.co (a known host).

✓ Credentials

The skill does not require unrelated secrets. HF_TOKEN is optional and only suggested to reduce download rate limits; no other credentials or tokens are requested. The scripts do not read other environment variables beyond VOICE_SPEED (documented) and the optional HF_TOKEN.

✓ Persistence & Privilege

The skill is not marked always:true and does not modify other skills or system-wide agent settings. It requires privileged actions only for dependency installation (apt/pip), which is documented in the README; otherwise it runs as the invoking user and stores temporary files under a mode-700 directory.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install kittentts-whatsapp
After installation, invoke the skill by name or use /kittentts-whatsapp
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.4

v1.0.4: Fixed /tmp world-readable issue — audio now written to mode-700 /tmp/kittentts-walkie/. Fixed unused speed parameter bug — VOICE_SPEED now passed to tts.generate(). WAV intermediates cleaned up after conversion. Added Security Notes section.

v1.0.3

v1.0.3: Added openclaw.metadata block with requires.bins (ffmpeg), requires.packages (kittentts), requires.network (huggingface.co), requires.privileged (true), and requires.warning string — surfaces warnings at registry/install time not just in prose.

v1.0.2

Renamed skill from KittenTTS WhatsApp Walkie-Talkie to KittenTTS WhatsApp

v1.0.1

v1.0.1: Added full dependency table, network call disclosure (Hugging Face model download ~25-80MB), model size table, privileged install warnings, and troubleshooting section.

v1.0.0

Initial release: KittenTTS + ffmpeg chain for WhatsApp voice notes. Handles 24kHz WAV → 16kHz OGG Opus conversion that WhatsApp requires.

Metadata

Slug kittentts-whatsapp

Version 1.0.4

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 5

Frequently Asked Questions

What is KittenTTS WhatsApp?

Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib... It is an AI Agent Skill for Claude Code / OpenClaw, with 126 downloads so far.

How do I install KittenTTS WhatsApp?

Run "/install kittentts-whatsapp" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is KittenTTS WhatsApp free?

Yes, KittenTTS WhatsApp is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does KittenTTS WhatsApp support?

KittenTTS WhatsApp is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created KittenTTS WhatsApp?

It is built and maintained by ReadY (@lakshibro); the current version is v1.0.4.

More Skills