Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!

Name: Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!
Author: zviratko

by zviratko · GitHub ↗ · v0.0.2 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install macos-say

Description

Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages

README (SKILL.md)

Say + FFmpeg TTS Pipeline

Use say (macOS native TTS) + ffmpeg to generate Opus voice messages for Telegram/Matrix.

Why not just `say`?

Telegram/Matrix require Opus codec voice messages
say outputs AIFF/m4a; must convert to .ogg (Opus) before sending
Telegram accepts: OGG/MP3/M4A as voice — but Opus OGG is the native format

Workflow

say -v "\x3Cvoice>" -o \x3Ctmpdir>/\x3Cname>.aiff "\x3Ctext>"
ffmpeg -i \x3Ctmpdir>/\x3Cname>.aiff -acodec libopus \x3Ctmpdir>/\x3Cname>.ogg -y

Send with message tool:

{
  "action": "send",
  "channel": "telegram",
  "media": "\x3Ctmpdir>/\x3Cname>.ogg",
  "asVoice": true,
  "target": "\x3Cchat_id>"
}

Recommended workspace directory

~/.openclaw/workspace/tmp/audio/

(Whitelist this path in exec permissions for faster approval)

Voice selection

Use say -v '?' to list available voices. Notable ones:

Trinoids — robotic/electronic voice (popular for bots)
Samantha — warm US female voice
Alex — US male voice
Fred — neutral US male voice
Karen — Australian female voice

Note: pass just the voice name (e.g. "Trinoids"), not the full en_US suffix.

Example: send a hello voice message

VOICE="Trinoids"
TEXT="Hello!"
DIR="$HOME/.openclaw/workspace/tmp/audio"
mkdir -p "$DIR"

say -v "$VOICE" -o "$DIR/hello.aiff" "$TEXT"
ffmpeg -i "$DIR/hello.aiff" -acodec libopus "$DIR/hello.ogg" -y

# Then send via message tool with asVoice: true

Format notes

Input to ffmpeg: AIFF (.aiff) works reliably; avoid .m4a with say
Output: Opus in Ogg container (libopus codec) — required for Telegram voice messages
Telegram sendVoice accepts: OGG, MP3, M4A — but native is Opus OGG
Sample rate: say outputs 24kHz AIFF; ffmpeg re-encodes to Opus at 24kHz

Integration with OpenClaw TTS

OpenClaw's built-in messages.tts only supports: ElevenLabs, Microsoft Edge, MiniMax, OpenAI.

This say+ffmpeg pipeline is a workaround for local-only TTS without API keys or cloud services. It's not auto-triggered by OpenClaw — call it manually via exec + message tool.

Language Detection → Voice Mapping

When responding to a voice message, detect the language from the STT output (Parakeet auto-detects). Then pick the matching say voice using i18n locale codes.

Finding voices by language:

say -v '?' 2>&1 | grep -E "cs_CZ|en_US|de_DE|fr_FR|it_IT|es_ES"

Language → voice selection priority:

Use \x3Cvoice> (Premium) if available
Fall back to \x3Cvoice> (Enhanced) if available
Fall back to base \x3Cvoice> name
Never use a voice that doesn't match the language

Language	i18n code	Preferred Voice
Czech	`cs_CZ`	`Zuzana (Premium)`
English (US)	`en_US`	`Trinoids` (no Premium/Enhanced available)
German	`de_DE`	`Grandma (Premium)` if available
French	`fr_FR`	`Grandma (Premium)` if available
Spanish	`es_ES`	`Grandma (Premium)` if available
Italian	`it_IT`	`Grandma (Premium)` if available

Key: Always use just the voice name (e.g. "Trinoids", "Zuzana"), not the full locale suffix. The locale suffix in say -v '?' output is for grepping/identification only.

Example workflow:

LANG="cs_CZ"
# Find best available voice for this language (Premium > Enhanced > base)
VOICE=$(say -v '?' 2>&1 | grep "$LANG" | head -3 | awk '{print $1}' | sed -n '1p')
say -v "$VOICE" -o reply.aiff "Česká odpověď"
ffmpeg -i reply.aiff -acodec libopus reply.ogg -y

TODOs

Detect language from STT transcription and auto-select appropriate say voice
Explore integrating into OpenClaw via custom TTS provider plugin
Investigate if OpenClaw supports post-processing TTS output via a hook
Test Matrix channel voice message format compatibility

Usage Guidance

What to consider before installing: 1) This only works when `say` exists (macOS); install `ffmpeg` and test the `say`→`ffmpeg` pipeline locally first. 2) Approve exec permissions only for a narrow workspace path (e.g., ~/.openclaw/workspace/tmp/audio). 3) Ensure the runtime escapes user-provided TEXT/VOICE variables (avoid passing raw, unsanitized strings to a shell) to prevent command injection. 4) Confirm how the 'message' tool is authorized to send media and avoid sending sensitive audio unintentionally. 5) If you run on non-macOS hosts, the skill will fail (metadata omits an OS restriction but the required binary `say` is macOS-only).

Capability Analysis

Type: OpenClaw Skill Name: macos-say Version: 0.0.2 The skill provides instructions for an AI agent to execute shell commands using `say` and `ffmpeg` for local text-to-speech. While the functionality is aligned with its stated purpose, the use of shell execution and filesystem access (specifically suggesting a workspace directory in `SKILL.md`) constitutes a high-risk capability. Furthermore, the provided bash examples are vulnerable to command injection if the agent processes unsanitized user input within the `$TEXT` or `$VOICE` variables.

Capability Assessment

ℹ Purpose & Capability

The name/description match the actual instructions: generating AIFF with macOS `say` then converting to Opus with `ffmpeg` is exactly what's needed for Telegram/Matrix voice messages. One minor inconsistency: skill metadata lists no OS restriction even though `say` is macOS-specific; the declared required binaries (`say`, `ffmpeg`) properly reflect the true platform dependency.

ℹ Instruction Scope

SKILL.md stays on‑topic (create AIFF with `say`, transcode with `ffmpeg`, then send with the message tool). No unrelated files, credentials, or external endpoints are referenced. Caution: the examples use shell interpolation (VOICE/TEXT variables). If untrusted input is passed into shell commands without proper escaping, there is a risk of shell/command injection — ensure the agent or runtime invokes `say`/`ffmpeg` with safely escaped arguments or argument lists rather than raw shell interpolation.

✓ Install Mechanism

No install spec (instruction-only) — minimal risk because nothing is downloaded or written by the skill itself. The runtime relies on system-installed `say` and `ffmpeg`.

✓ Credentials

No environment variables, credentials, or config paths are requested. The skill does suggest a workspace path (~/.openclaw/workspace/tmp/audio) for temporary files; this is reasonable but should be whitelisted only if you accept exec access for that limited path.

✓ Persistence & Privilege

always is false and the skill is user-invocable; it does not request permanent or elevated presence. The only privilege-related suggestion is to whitelist a dedicated workspace path for exec permissions to speed approvals — keep permissions scoped to that path.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install macos-say
After installation, invoke the skill by name or use /macos-say
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.0.2

- Add detailed documentation for using macOS `say` with ffmpeg to generate Opus/OGG voice messages for Telegram/Matrix. - Explain why converting to Opus format is required and outline recommended audio workflow. - Provide usage examples, tips for selecting voices and mapping languages, and workspace directory suggestions. - Discuss current OpenClaw integration limitations and propose potential enhancements. - Include TODOs for future language detection, integration, and compatibility testing.

Metadata

Slug macos-say

Version 0.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages. It is an AI Agent Skill for Claude Code / OpenClaw, with 76 downloads so far.

How do I install Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

Run "/install macos-say" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! free?

Yes, Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! support?

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

It is built and maintained by zviratko (@zviratko); the current version is v0.0.2.

More Skills