← 返回 Skills 市场
lakshibro

KittenTTS WhatsApp

作者 ReadY · GitHub ↗ · v1.0.4 · MIT-0
cross-platform ⚠ suspicious
126
总下载
0
收藏
0
当前安装
5
版本数
在 OpenClaw 中安装
/install kittentts-whatsapp
功能描述
Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib...
使用说明 (SKILL.md)

KittenTTS WhatsApp Voice

Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.

⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.

System Dependencies

Dependency Install command Size Notes
ffmpeg apt-get install -y ffmpeg ~30MB Available in most distro repos
kittentts pip3 install kittentts --break-system-packages pulls ~25-80MB from Hugging Face on first run Python package
libopus bundled with ffmpeg OGG encoding support
soundfile pulled by kittentts Python package

Network Calls

  • First run: downloads TTS model (~25-80MB) from huggingface.co/KittenML based on model size chosen
  • No API keys required — fully offline capable after model download
  • Set HF_TOKEN env var to avoid unauthenticated rate limits on model download

Model Options

Model Parameters Size Hugging Face ID
nano (int8) 15M 25MB KittenML/kitten-tts-nano-0.8-int8
nano 15M 56MB KittenML/kitten-tts-nano-0.8-fp32
micro 40M 41MB KittenML/kitten-tts-micro-0.8
mini 80M 80MB KittenML/kitten-tts-mini-0.8

Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.

Setup

Run these manually before the skill is used:

# 1. System package (requires root/privileged)
apt-get install -y ffmpeg

# 2. Python package
pip3 install kittentts --break-system-packages

# 3. Optional: set Hugging Face token to avoid rate limits
# echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.bashrc

Restart OpenClaw after installing dependencies so the new packages are in PATH.

Usage

TTS only (no transcription)

bash scripts/tts_walkie.sh "Your message here" Bella
# Output: /tmp/walkie_reply.ogg (16kHz OGG Opus, WhatsApp-ready)

Transcription only (optional — requires whisper)

# Install whisper (one-time, ~140MB-1.4GB depending on model)
pip3 install whisper --break-system-packages

bash scripts/transcribe.sh /path/to/audio.ogg [model]
# Model: tiny | base | small | medium | large (default: base)

Voices

Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Default: Bella

Security Notes

  • Audio files are written to a private /tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.
  • WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
  • Set VOICE_SPEED env var to adjust speech rate (default: 1.0).

Files

kittentts-whatsapp/
├── SKILL.md
└── scripts/
    ├── tts_walkie.sh      # TTS + ffmpeg conversion (speed is now used)
    └── transcribe.sh       # whisper transcription (optional)

⚠️ Privileged Install Warning

The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.

Troubleshooting

Audio sends but is silent or rejected by WhatsApp: → Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg → Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.

TTS generation is slow: → Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.

Hugging Face download rate limit: → Set HF_TOKEN in your environment. Free accounts get lower rate limits.

安全使用建议
This skill appears to do exactly what it says (generate WhatsApp-ready voice notes and optionally transcribe audio). Before installing, consider: 1) Do not run the provided apt-get / pip commands on a managed or production machine without approval — pip --break-system-packages can change system Python packages. Prefer using a virtualenv, container, or dedicated machine. 2) The model download comes from huggingface.co (~25–80MB); set HF_TOKEN only if you trust where you store the token (adding it to ~/.bashrc stores it in plaintext). 3) Verify ffmpeg and Python dependencies yourself and inspect the two scripts (they are short and straightforward). 4) The registry metadata and SKILL.md metadata disagree about required binaries — treat SKILL.md as the authoritative source. If you need lower risk, run this inside a disposable VM/container.
功能分析
Type: OpenClaw Skill Name: kittentts-whatsapp Version: 1.0.4 The skill contains critical injection vulnerabilities in scripts/tts_walkie.sh and scripts/transcribe.sh, where user-controlled bash variables are directly embedded into Python heredocs (e.g., text = """$TEXT"""). This allows an attacker to execute arbitrary Python code by including triple quotes in the input. While the stated purpose of generating WhatsApp-compatible audio is plausible and no evidence of intentional data exfiltration or backdoors was found, the unsafe code construction and requirement for privileged system-wide installation pose a significant security risk.
能力评估
Purpose & Capability
The skill's name/description (KittenTTS → WhatsApp OGG) align with the included scripts and instructions (tts_walkie.sh uses KittenTTS and ffmpeg; transcribe.sh uses whisper + ffmpeg). Minor inconsistency: registry metadata listed 'Required env vars: none' and no required binaries, while SKILL.md metadata declares ffmpeg, network access to huggingface.co, and 'privileged: true'. This appears to be an authoring/metadata mismatch, not malicious behavior.
Instruction Scope
Runtime instructions and the two scripts stick to audio generation/transcription and temporary file handling. Scripts create a private /tmp directory, write WAV/OGG files, call ffmpeg, whisper, and KittenTTS; they do not access unrelated system files or send data to external endpoints beyond downloading models from Hugging Face. Note: SKILL.md suggests adding HF_TOKEN to ~/.bashrc (writes a token into shell config) — this is a user-level change you should consider before applying.
Install Mechanism
There is no automated install spec; the docs ask you to run apt-get and pip3 install manually. That is expected for this use case but is intrusive: pip3 install kittentts --break-system-packages and apt-get install -y ffmpeg require root and can alter system Python packages on managed machines. Model downloads (~25–80MB) come from huggingface.co (a known host).
Credentials
The skill does not require unrelated secrets. HF_TOKEN is optional and only suggested to reduce download rate limits; no other credentials or tokens are requested. The scripts do not read other environment variables beyond VOICE_SPEED (documented) and the optional HF_TOKEN.
Persistence & Privilege
The skill is not marked always:true and does not modify other skills or system-wide agent settings. It requires privileged actions only for dependency installation (apt/pip), which is documented in the README; otherwise it runs as the invoking user and stores temporary files under a mode-700 directory.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install kittentts-whatsapp
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /kittentts-whatsapp 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.4
v1.0.4: Fixed /tmp world-readable issue — audio now written to mode-700 /tmp/kittentts-walkie/. Fixed unused speed parameter bug — VOICE_SPEED now passed to tts.generate(). WAV intermediates cleaned up after conversion. Added Security Notes section.
v1.0.3
v1.0.3: Added openclaw.metadata block with requires.bins (ffmpeg), requires.packages (kittentts), requires.network (huggingface.co), requires.privileged (true), and requires.warning string — surfaces warnings at registry/install time not just in prose.
v1.0.2
Renamed skill from KittenTTS WhatsApp Walkie-Talkie to KittenTTS WhatsApp
v1.0.1
v1.0.1: Added full dependency table, network call disclosure (Hugging Face model download ~25-80MB), model size table, privileged install warnings, and troubleshooting section.
v1.0.0
Initial release: KittenTTS + ffmpeg chain for WhatsApp voice notes. Handles 24kHz WAV → 16kHz OGG Opus conversion that WhatsApp requires.
元数据
Slug kittentts-whatsapp
版本 1.0.4
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 5
常见问题

KittenTTS WhatsApp 是什么?

Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 126 次。

如何安装 KittenTTS WhatsApp?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install kittentts-whatsapp」即可一键安装,无需额外配置。

KittenTTS WhatsApp 是免费的吗?

是的,KittenTTS WhatsApp 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

KittenTTS WhatsApp 支持哪些平台?

KittenTTS WhatsApp 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 KittenTTS WhatsApp?

由 ReadY(@lakshibro)开发并维护,当前版本 v1.0.4。

💬 留言讨论