Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!

Name: Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!
Author: zviratko

作者 zviratko · GitHub ↗ · v0.0.2 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install macos-say

功能描述

Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages

使用说明 (SKILL.md)

Say + FFmpeg TTS Pipeline

Use say (macOS native TTS) + ffmpeg to generate Opus voice messages for Telegram/Matrix.

Why not just `say`?

Telegram/Matrix require Opus codec voice messages
say outputs AIFF/m4a; must convert to .ogg (Opus) before sending
Telegram accepts: OGG/MP3/M4A as voice — but Opus OGG is the native format

Workflow

say -v "\x3Cvoice>" -o \x3Ctmpdir>/\x3Cname>.aiff "\x3Ctext>"
ffmpeg -i \x3Ctmpdir>/\x3Cname>.aiff -acodec libopus \x3Ctmpdir>/\x3Cname>.ogg -y

Send with message tool:

{
  "action": "send",
  "channel": "telegram",
  "media": "\x3Ctmpdir>/\x3Cname>.ogg",
  "asVoice": true,
  "target": "\x3Cchat_id>"
}

Recommended workspace directory

~/.openclaw/workspace/tmp/audio/

(Whitelist this path in exec permissions for faster approval)

Voice selection

Use say -v '?' to list available voices. Notable ones:

Trinoids — robotic/electronic voice (popular for bots)
Samantha — warm US female voice
Alex — US male voice
Fred — neutral US male voice
Karen — Australian female voice

Note: pass just the voice name (e.g. "Trinoids"), not the full en_US suffix.

Example: send a hello voice message

VOICE="Trinoids"
TEXT="Hello!"
DIR="$HOME/.openclaw/workspace/tmp/audio"
mkdir -p "$DIR"

say -v "$VOICE" -o "$DIR/hello.aiff" "$TEXT"
ffmpeg -i "$DIR/hello.aiff" -acodec libopus "$DIR/hello.ogg" -y

# Then send via message tool with asVoice: true

Format notes

Input to ffmpeg: AIFF (.aiff) works reliably; avoid .m4a with say
Output: Opus in Ogg container (libopus codec) — required for Telegram voice messages
Telegram sendVoice accepts: OGG, MP3, M4A — but native is Opus OGG
Sample rate: say outputs 24kHz AIFF; ffmpeg re-encodes to Opus at 24kHz

Integration with OpenClaw TTS

OpenClaw's built-in messages.tts only supports: ElevenLabs, Microsoft Edge, MiniMax, OpenAI.

This say+ffmpeg pipeline is a workaround for local-only TTS without API keys or cloud services. It's not auto-triggered by OpenClaw — call it manually via exec + message tool.

Language Detection → Voice Mapping

When responding to a voice message, detect the language from the STT output (Parakeet auto-detects). Then pick the matching say voice using i18n locale codes.

Finding voices by language:

say -v '?' 2>&1 | grep -E "cs_CZ|en_US|de_DE|fr_FR|it_IT|es_ES"

Language → voice selection priority:

Use \x3Cvoice> (Premium) if available
Fall back to \x3Cvoice> (Enhanced) if available
Fall back to base \x3Cvoice> name
Never use a voice that doesn't match the language

Language	i18n code	Preferred Voice
Czech	`cs_CZ`	`Zuzana (Premium)`
English (US)	`en_US`	`Trinoids` (no Premium/Enhanced available)
German	`de_DE`	`Grandma (Premium)` if available
French	`fr_FR`	`Grandma (Premium)` if available
Spanish	`es_ES`	`Grandma (Premium)` if available
Italian	`it_IT`	`Grandma (Premium)` if available

Key: Always use just the voice name (e.g. "Trinoids", "Zuzana"), not the full locale suffix. The locale suffix in say -v '?' output is for grepping/identification only.

Example workflow:

LANG="cs_CZ"
# Find best available voice for this language (Premium > Enhanced > base)
VOICE=$(say -v '?' 2>&1 | grep "$LANG" | head -3 | awk '{print $1}' | sed -n '1p')
say -v "$VOICE" -o reply.aiff "Česká odpověď"
ffmpeg -i reply.aiff -acodec libopus reply.ogg -y

TODOs

Detect language from STT transcription and auto-select appropriate say voice
Explore integrating into OpenClaw via custom TTS provider plugin
Investigate if OpenClaw supports post-processing TTS output via a hook
Test Matrix channel voice message format compatibility

安全使用建议

What to consider before installing: 1) This only works when `say` exists (macOS); install `ffmpeg` and test the `say`→`ffmpeg` pipeline locally first. 2) Approve exec permissions only for a narrow workspace path (e.g., ~/.openclaw/workspace/tmp/audio). 3) Ensure the runtime escapes user-provided TEXT/VOICE variables (avoid passing raw, unsanitized strings to a shell) to prevent command injection. 4) Confirm how the 'message' tool is authorized to send media and avoid sending sensitive audio unintentionally. 5) If you run on non-macOS hosts, the skill will fail (metadata omits an OS restriction but the required binary `say` is macOS-only).

功能分析

Type: OpenClaw Skill Name: macos-say Version: 0.0.2 The skill provides instructions for an AI agent to execute shell commands using `say` and `ffmpeg` for local text-to-speech. While the functionality is aligned with its stated purpose, the use of shell execution and filesystem access (specifically suggesting a workspace directory in `SKILL.md`) constitutes a high-risk capability. Furthermore, the provided bash examples are vulnerable to command injection if the agent processes unsanitized user input within the `$TEXT` or `$VOICE` variables.

能力评估

ℹ Purpose & Capability

The name/description match the actual instructions: generating AIFF with macOS `say` then converting to Opus with `ffmpeg` is exactly what's needed for Telegram/Matrix voice messages. One minor inconsistency: skill metadata lists no OS restriction even though `say` is macOS-specific; the declared required binaries (`say`, `ffmpeg`) properly reflect the true platform dependency.

ℹ Instruction Scope

SKILL.md stays on‑topic (create AIFF with `say`, transcode with `ffmpeg`, then send with the message tool). No unrelated files, credentials, or external endpoints are referenced. Caution: the examples use shell interpolation (VOICE/TEXT variables). If untrusted input is passed into shell commands without proper escaping, there is a risk of shell/command injection — ensure the agent or runtime invokes `say`/`ffmpeg` with safely escaped arguments or argument lists rather than raw shell interpolation.

✓ Install Mechanism

No install spec (instruction-only) — minimal risk because nothing is downloaded or written by the skill itself. The runtime relies on system-installed `say` and `ffmpeg`.

✓ Credentials

No environment variables, credentials, or config paths are requested. The skill does suggest a workspace path (~/.openclaw/workspace/tmp/audio) for temporary files; this is reasonable but should be whitelisted only if you accept exec access for that limited path.

✓ Persistence & Privilege

always is false and the skill is user-invocable; it does not request permanent or elevated presence. The only privilege-related suggestion is to whitelist a dedicated workspace path for exec permissions to speed approvals — keep permissions scoped to that path.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install macos-say
安装完成后，直接呼叫该 Skill 的名称或使用 /macos-say 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.0.2

- Add detailed documentation for using macOS `say` with ffmpeg to generate Opus/OGG voice messages for Telegram/Matrix. - Explain why converting to Opus format is required and outline recommended audio workflow. - Provide usage examples, tips for selecting voices and mapping languages, and workspace directory suggestions. - Discuss current OpenClaw integration limitations and propose potential enhancements. - Include TODOs for future language detection, integration, and compatibility testing.

元数据

Slug macos-say

版本 0.0.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题