← 返回 Skills 市场

KittenTTS WhatsApp

Name: KittenTTS WhatsApp
Author: lakshibro

作者 ReadY · GitHub ↗ · v1.0.4 · MIT-0

cross-platform ⚠ suspicious

126

总下载

当前安装

版本数

在 OpenClaw 中安装

/install kittentts-whatsapp

功能描述

Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib...

使用说明 (SKILL.md)

KittenTTS WhatsApp Voice

Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.

⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.

System Dependencies

Dependency	Install command	Size	Notes
`ffmpeg`	`apt-get install -y ffmpeg`	~30MB	Available in most distro repos
`kittentts`	`pip3 install kittentts --break-system-packages`	pulls ~25-80MB from Hugging Face on first run	Python package
`libopus`	bundled with ffmpeg	—	OGG encoding support
`soundfile`	pulled by kittentts	—	Python package

Network Calls

First run: downloads TTS model (~25-80MB) from huggingface.co/KittenML based on model size chosen
No API keys required — fully offline capable after model download
Set HF_TOKEN env var to avoid unauthenticated rate limits on model download

Model Options

Model	Parameters	Size	Hugging Face ID
nano (int8)	15M	25MB	`KittenML/kitten-tts-nano-0.8-int8`
nano	15M	56MB	`KittenML/kitten-tts-nano-0.8-fp32`
micro	40M	41MB	`KittenML/kitten-tts-micro-0.8`
mini	80M	80MB	`KittenML/kitten-tts-mini-0.8`

Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.

Setup

Run these manually before the skill is used:

# 1. System package (requires root/privileged)
apt-get install -y ffmpeg

# 2. Python package
pip3 install kittentts --break-system-packages

# 3. Optional: set Hugging Face token to avoid rate limits
# echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.bashrc

Restart OpenClaw after installing dependencies so the new packages are in PATH.

Usage

TTS only (no transcription)

bash scripts/tts_walkie.sh "Your message here" Bella
# Output: /tmp/walkie_reply.ogg (16kHz OGG Opus, WhatsApp-ready)

Transcription only (optional — requires whisper)

# Install whisper (one-time, ~140MB-1.4GB depending on model)
pip3 install whisper --break-system-packages

bash scripts/transcribe.sh /path/to/audio.ogg [model]
# Model: tiny | base | small | medium | large (default: base)

Voices

Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Default: Bella

Security Notes

Audio files are written to a private /tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.
WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
Set VOICE_SPEED env var to adjust speech rate (default: 1.0).

Files

kittentts-whatsapp/
├── SKILL.md
└── scripts/
    ├── tts_walkie.sh      # TTS + ffmpeg conversion (speed is now used)
    └── transcribe.sh       # whisper transcription (optional)

⚠️ Privileged Install Warning

The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.

Troubleshooting

Audio sends but is silent or rejected by WhatsApp: → Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg → Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.

TTS generation is slow: → Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.

Hugging Face download rate limit: → Set HF_TOKEN in your environment. Free accounts get lower rate limits.

安全使用建议

This skill appears to do exactly what it says (generate WhatsApp-ready voice notes and optionally transcribe audio). Before installing, consider: 1) Do not run the provided apt-get / pip commands on a managed or production machine without approval — pip --break-system-packages can change system Python packages. Prefer using a virtualenv, container, or dedicated machine. 2) The model download comes from huggingface.co (~25–80MB); set HF_TOKEN only if you trust where you store the token (adding it to ~/.bashrc stores it in plaintext). 3) Verify ffmpeg and Python dependencies yourself and inspect the two scripts (they are short and straightforward). 4) The registry metadata and SKILL.md metadata disagree about required binaries — treat SKILL.md as the authoritative source. If you need lower risk, run this inside a disposable VM/container.

功能分析

Type: OpenClaw Skill Name: kittentts-whatsapp Version: 1.0.4 The skill contains critical injection vulnerabilities in scripts/tts_walkie.sh and scripts/transcribe.sh, where user-controlled bash variables are directly embedded into Python heredocs (e.g., text = """$TEXT"""). This allows an attacker to execute arbitrary Python code by including triple quotes in the input. While the stated purpose of generating WhatsApp-compatible audio is plausible and no evidence of intentional data exfiltration or backdoors was found, the unsafe code construction and requirement for privileged system-wide installation pose a significant security risk.

能力评估

ℹ Purpose & Capability

The skill's name/description (KittenTTS → WhatsApp OGG) align with the included scripts and instructions (tts_walkie.sh uses KittenTTS and ffmpeg; transcribe.sh uses whisper + ffmpeg). Minor inconsistency: registry metadata listed 'Required env vars: none' and no required binaries, while SKILL.md metadata declares ffmpeg, network access to huggingface.co, and 'privileged: true'. This appears to be an authoring/metadata mismatch, not malicious behavior.

✓ Instruction Scope

Runtime instructions and the two scripts stick to audio generation/transcription and temporary file handling. Scripts create a private /tmp directory, write WAV/OGG files, call ffmpeg, whisper, and KittenTTS; they do not access unrelated system files or send data to external endpoints beyond downloading models from Hugging Face. Note: SKILL.md suggests adding HF_TOKEN to ~/.bashrc (writes a token into shell config) — this is a user-level change you should consider before applying.

ℹ Install Mechanism

There is no automated install spec; the docs ask you to run apt-get and pip3 install manually. That is expected for this use case but is intrusive: pip3 install kittentts --break-system-packages and apt-get install -y ffmpeg require root and can alter system Python packages on managed machines. Model downloads (~25–80MB) come from huggingface.co (a known host).

✓ Credentials

The skill does not require unrelated secrets. HF_TOKEN is optional and only suggested to reduce download rate limits; no other credentials or tokens are requested. The scripts do not read other environment variables beyond VOICE_SPEED (documented) and the optional HF_TOKEN.

✓ Persistence & Privilege

The skill is not marked always:true and does not modify other skills or system-wide agent settings. It requires privileged actions only for dependency installation (apt/pip), which is documented in the README; otherwise it runs as the invoking user and stores temporary files under a mode-700 directory.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install kittentts-whatsapp
安装完成后，直接呼叫该 Skill 的名称或使用 /kittentts-whatsapp 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.4

v1.0.4: Fixed /tmp world-readable issue — audio now written to mode-700 /tmp/kittentts-walkie/. Fixed unused speed parameter bug — VOICE_SPEED now passed to tts.generate(). WAV intermediates cleaned up after conversion. Added Security Notes section.

v1.0.3

v1.0.3: Added openclaw.metadata block with requires.bins (ffmpeg), requires.packages (kittentts), requires.network (huggingface.co), requires.privileged (true), and requires.warning string — surfaces warnings at registry/install time not just in prose.

v1.0.2

Renamed skill from KittenTTS WhatsApp Walkie-Talkie to KittenTTS WhatsApp

v1.0.1

v1.0.1: Added full dependency table, network call disclosure (Hugging Face model download ~25-80MB), model size table, privileged install warnings, and troubleshooting section.

v1.0.0

Initial release: KittenTTS + ffmpeg chain for WhatsApp voice notes. Handles 24kHz WAV → 16kHz OGG Opus conversion that WhatsApp requires.

元数据

Slug kittentts-whatsapp

版本 1.0.4

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 5

常见问题