← Back to Skills Marketplace
zviratko

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!

by zviratko · GitHub ↗ · v0.0.2 · MIT-0
cross-platform ⚠ suspicious
76
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install macos-say
Description
Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages
README (SKILL.md)

Say + FFmpeg TTS Pipeline

Use say (macOS native TTS) + ffmpeg to generate Opus voice messages for Telegram/Matrix.

Why not just say?

  • Telegram/Matrix require Opus codec voice messages
  • say outputs AIFF/m4a; must convert to .ogg (Opus) before sending
  • Telegram accepts: OGG/MP3/M4A as voice — but Opus OGG is the native format

Workflow

say -v "\x3Cvoice>" -o \x3Ctmpdir>/\x3Cname>.aiff "\x3Ctext>"
ffmpeg -i \x3Ctmpdir>/\x3Cname>.aiff -acodec libopus \x3Ctmpdir>/\x3Cname>.ogg -y

Send with message tool:

{
  "action": "send",
  "channel": "telegram",
  "media": "\x3Ctmpdir>/\x3Cname>.ogg",
  "asVoice": true,
  "target": "\x3Cchat_id>"
}

Recommended workspace directory

~/.openclaw/workspace/tmp/audio/

(Whitelist this path in exec permissions for faster approval)

Voice selection

Use say -v '?' to list available voices. Notable ones:

  • Trinoids — robotic/electronic voice (popular for bots)
  • Samantha — warm US female voice
  • Alex — US male voice
  • Fred — neutral US male voice
  • Karen — Australian female voice

Note: pass just the voice name (e.g. "Trinoids"), not the full en_US suffix.

Example: send a hello voice message

VOICE="Trinoids"
TEXT="Hello!"
DIR="$HOME/.openclaw/workspace/tmp/audio"
mkdir -p "$DIR"

say -v "$VOICE" -o "$DIR/hello.aiff" "$TEXT"
ffmpeg -i "$DIR/hello.aiff" -acodec libopus "$DIR/hello.ogg" -y

# Then send via message tool with asVoice: true

Format notes

  • Input to ffmpeg: AIFF (.aiff) works reliably; avoid .m4a with say
  • Output: Opus in Ogg container (libopus codec) — required for Telegram voice messages
  • Telegram sendVoice accepts: OGG, MP3, M4A — but native is Opus OGG
  • Sample rate: say outputs 24kHz AIFF; ffmpeg re-encodes to Opus at 24kHz

Integration with OpenClaw TTS

OpenClaw's built-in messages.tts only supports: ElevenLabs, Microsoft Edge, MiniMax, OpenAI.

This say+ffmpeg pipeline is a workaround for local-only TTS without API keys or cloud services. It's not auto-triggered by OpenClaw — call it manually via exec + message tool.

Language Detection → Voice Mapping

When responding to a voice message, detect the language from the STT output (Parakeet auto-detects). Then pick the matching say voice using i18n locale codes.

Finding voices by language:

say -v '?' 2>&1 | grep -E "cs_CZ|en_US|de_DE|fr_FR|it_IT|es_ES"

Language → voice selection priority:

  1. Use \x3Cvoice> (Premium) if available
  2. Fall back to \x3Cvoice> (Enhanced) if available
  3. Fall back to base \x3Cvoice> name
  4. Never use a voice that doesn't match the language
Language i18n code Preferred Voice
Czech cs_CZ Zuzana (Premium)
English (US) en_US Trinoids (no Premium/Enhanced available)
German de_DE Grandma (Premium) if available
French fr_FR Grandma (Premium) if available
Spanish es_ES Grandma (Premium) if available
Italian it_IT Grandma (Premium) if available

Key: Always use just the voice name (e.g. "Trinoids", "Zuzana"), not the full locale suffix. The locale suffix in say -v '?' output is for grepping/identification only.

Example workflow:

LANG="cs_CZ"
# Find best available voice for this language (Premium > Enhanced > base)
VOICE=$(say -v '?' 2>&1 | grep "$LANG" | head -3 | awk '{print $1}' | sed -n '1p')
say -v "$VOICE" -o reply.aiff "Česká odpověď"
ffmpeg -i reply.aiff -acodec libopus reply.ogg -y

TODOs

  • Detect language from STT transcription and auto-select appropriate say voice
  • Explore integrating into OpenClaw via custom TTS provider plugin
  • Investigate if OpenClaw supports post-processing TTS output via a hook
  • Test Matrix channel voice message format compatibility
Usage Guidance
What to consider before installing: 1) This only works when `say` exists (macOS); install `ffmpeg` and test the `say`→`ffmpeg` pipeline locally first. 2) Approve exec permissions only for a narrow workspace path (e.g., ~/.openclaw/workspace/tmp/audio). 3) Ensure the runtime escapes user-provided TEXT/VOICE variables (avoid passing raw, unsanitized strings to a shell) to prevent command injection. 4) Confirm how the 'message' tool is authorized to send media and avoid sending sensitive audio unintentionally. 5) If you run on non-macOS hosts, the skill will fail (metadata omits an OS restriction but the required binary `say` is macOS-only).
Capability Analysis
Type: OpenClaw Skill Name: macos-say Version: 0.0.2 The skill provides instructions for an AI agent to execute shell commands using `say` and `ffmpeg` for local text-to-speech. While the functionality is aligned with its stated purpose, the use of shell execution and filesystem access (specifically suggesting a workspace directory in `SKILL.md`) constitutes a high-risk capability. Furthermore, the provided bash examples are vulnerable to command injection if the agent processes unsanitized user input within the `$TEXT` or `$VOICE` variables.
Capability Assessment
Purpose & Capability
The name/description match the actual instructions: generating AIFF with macOS `say` then converting to Opus with `ffmpeg` is exactly what's needed for Telegram/Matrix voice messages. One minor inconsistency: skill metadata lists no OS restriction even though `say` is macOS-specific; the declared required binaries (`say`, `ffmpeg`) properly reflect the true platform dependency.
Instruction Scope
SKILL.md stays on‑topic (create AIFF with `say`, transcode with `ffmpeg`, then send with the message tool). No unrelated files, credentials, or external endpoints are referenced. Caution: the examples use shell interpolation (VOICE/TEXT variables). If untrusted input is passed into shell commands without proper escaping, there is a risk of shell/command injection — ensure the agent or runtime invokes `say`/`ffmpeg` with safely escaped arguments or argument lists rather than raw shell interpolation.
Install Mechanism
No install spec (instruction-only) — minimal risk because nothing is downloaded or written by the skill itself. The runtime relies on system-installed `say` and `ffmpeg`.
Credentials
No environment variables, credentials, or config paths are requested. The skill does suggest a workspace path (~/.openclaw/workspace/tmp/audio) for temporary files; this is reasonable but should be whitelisted only if you accept exec access for that limited path.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request permanent or elevated presence. The only privilege-related suggestion is to whitelist a dedicated workspace path for exec permissions to speed approvals — keep permissions scoped to that path.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install macos-say
  3. After installation, invoke the skill by name or use /macos-say
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.0.2
- Add detailed documentation for using macOS `say` with ffmpeg to generate Opus/OGG voice messages for Telegram/Matrix. - Explain why converting to Opus format is required and outline recommended audio workflow. - Provide usage examples, tips for selecting voices and mapping languages, and workspace directory suggestions. - Discuss current OpenClaw integration limitations and propose potential enhancements. - Include TODOs for future language detection, integration, and compatibility testing.
Metadata
Slug macos-say
Version 0.0.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages. It is an AI Agent Skill for Claude Code / OpenClaw, with 76 downloads so far.

How do I install Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

Run "/install macos-say" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! free?

Yes, Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! support?

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

It is built and maintained by zviratko (@zviratko); the current version is v0.0.2.

💬 Comments