← 返回 Skills 市场
zviratko

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!

作者 zviratko · GitHub ↗ · v0.0.2 · MIT-0
cross-platform ⚠ suspicious
76
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install macos-say
功能描述
Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages
使用说明 (SKILL.md)

Say + FFmpeg TTS Pipeline

Use say (macOS native TTS) + ffmpeg to generate Opus voice messages for Telegram/Matrix.

Why not just say?

  • Telegram/Matrix require Opus codec voice messages
  • say outputs AIFF/m4a; must convert to .ogg (Opus) before sending
  • Telegram accepts: OGG/MP3/M4A as voice — but Opus OGG is the native format

Workflow

say -v "\x3Cvoice>" -o \x3Ctmpdir>/\x3Cname>.aiff "\x3Ctext>"
ffmpeg -i \x3Ctmpdir>/\x3Cname>.aiff -acodec libopus \x3Ctmpdir>/\x3Cname>.ogg -y

Send with message tool:

{
  "action": "send",
  "channel": "telegram",
  "media": "\x3Ctmpdir>/\x3Cname>.ogg",
  "asVoice": true,
  "target": "\x3Cchat_id>"
}

Recommended workspace directory

~/.openclaw/workspace/tmp/audio/

(Whitelist this path in exec permissions for faster approval)

Voice selection

Use say -v '?' to list available voices. Notable ones:

  • Trinoids — robotic/electronic voice (popular for bots)
  • Samantha — warm US female voice
  • Alex — US male voice
  • Fred — neutral US male voice
  • Karen — Australian female voice

Note: pass just the voice name (e.g. "Trinoids"), not the full en_US suffix.

Example: send a hello voice message

VOICE="Trinoids"
TEXT="Hello!"
DIR="$HOME/.openclaw/workspace/tmp/audio"
mkdir -p "$DIR"

say -v "$VOICE" -o "$DIR/hello.aiff" "$TEXT"
ffmpeg -i "$DIR/hello.aiff" -acodec libopus "$DIR/hello.ogg" -y

# Then send via message tool with asVoice: true

Format notes

  • Input to ffmpeg: AIFF (.aiff) works reliably; avoid .m4a with say
  • Output: Opus in Ogg container (libopus codec) — required for Telegram voice messages
  • Telegram sendVoice accepts: OGG, MP3, M4A — but native is Opus OGG
  • Sample rate: say outputs 24kHz AIFF; ffmpeg re-encodes to Opus at 24kHz

Integration with OpenClaw TTS

OpenClaw's built-in messages.tts only supports: ElevenLabs, Microsoft Edge, MiniMax, OpenAI.

This say+ffmpeg pipeline is a workaround for local-only TTS without API keys or cloud services. It's not auto-triggered by OpenClaw — call it manually via exec + message tool.

Language Detection → Voice Mapping

When responding to a voice message, detect the language from the STT output (Parakeet auto-detects). Then pick the matching say voice using i18n locale codes.

Finding voices by language:

say -v '?' 2>&1 | grep -E "cs_CZ|en_US|de_DE|fr_FR|it_IT|es_ES"

Language → voice selection priority:

  1. Use \x3Cvoice> (Premium) if available
  2. Fall back to \x3Cvoice> (Enhanced) if available
  3. Fall back to base \x3Cvoice> name
  4. Never use a voice that doesn't match the language
Language i18n code Preferred Voice
Czech cs_CZ Zuzana (Premium)
English (US) en_US Trinoids (no Premium/Enhanced available)
German de_DE Grandma (Premium) if available
French fr_FR Grandma (Premium) if available
Spanish es_ES Grandma (Premium) if available
Italian it_IT Grandma (Premium) if available

Key: Always use just the voice name (e.g. "Trinoids", "Zuzana"), not the full locale suffix. The locale suffix in say -v '?' output is for grepping/identification only.

Example workflow:

LANG="cs_CZ"
# Find best available voice for this language (Premium > Enhanced > base)
VOICE=$(say -v '?' 2>&1 | grep "$LANG" | head -3 | awk '{print $1}' | sed -n '1p')
say -v "$VOICE" -o reply.aiff "Česká odpověď"
ffmpeg -i reply.aiff -acodec libopus reply.ogg -y

TODOs

  • Detect language from STT transcription and auto-select appropriate say voice
  • Explore integrating into OpenClaw via custom TTS provider plugin
  • Investigate if OpenClaw supports post-processing TTS output via a hook
  • Test Matrix channel voice message format compatibility
安全使用建议
What to consider before installing: 1) This only works when `say` exists (macOS); install `ffmpeg` and test the `say`→`ffmpeg` pipeline locally first. 2) Approve exec permissions only for a narrow workspace path (e.g., ~/.openclaw/workspace/tmp/audio). 3) Ensure the runtime escapes user-provided TEXT/VOICE variables (avoid passing raw, unsanitized strings to a shell) to prevent command injection. 4) Confirm how the 'message' tool is authorized to send media and avoid sending sensitive audio unintentionally. 5) If you run on non-macOS hosts, the skill will fail (metadata omits an OS restriction but the required binary `say` is macOS-only).
功能分析
Type: OpenClaw Skill Name: macos-say Version: 0.0.2 The skill provides instructions for an AI agent to execute shell commands using `say` and `ffmpeg` for local text-to-speech. While the functionality is aligned with its stated purpose, the use of shell execution and filesystem access (specifically suggesting a workspace directory in `SKILL.md`) constitutes a high-risk capability. Furthermore, the provided bash examples are vulnerable to command injection if the agent processes unsanitized user input within the `$TEXT` or `$VOICE` variables.
能力评估
Purpose & Capability
The name/description match the actual instructions: generating AIFF with macOS `say` then converting to Opus with `ffmpeg` is exactly what's needed for Telegram/Matrix voice messages. One minor inconsistency: skill metadata lists no OS restriction even though `say` is macOS-specific; the declared required binaries (`say`, `ffmpeg`) properly reflect the true platform dependency.
Instruction Scope
SKILL.md stays on‑topic (create AIFF with `say`, transcode with `ffmpeg`, then send with the message tool). No unrelated files, credentials, or external endpoints are referenced. Caution: the examples use shell interpolation (VOICE/TEXT variables). If untrusted input is passed into shell commands without proper escaping, there is a risk of shell/command injection — ensure the agent or runtime invokes `say`/`ffmpeg` with safely escaped arguments or argument lists rather than raw shell interpolation.
Install Mechanism
No install spec (instruction-only) — minimal risk because nothing is downloaded or written by the skill itself. The runtime relies on system-installed `say` and `ffmpeg`.
Credentials
No environment variables, credentials, or config paths are requested. The skill does suggest a workspace path (~/.openclaw/workspace/tmp/audio) for temporary files; this is reasonable but should be whitelisted only if you accept exec access for that limited path.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request permanent or elevated presence. The only privilege-related suggestion is to whitelist a dedicated workspace path for exec permissions to speed approvals — keep permissions scoped to that path.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install macos-say
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /macos-say 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.0.2
- Add detailed documentation for using macOS `say` with ffmpeg to generate Opus/OGG voice messages for Telegram/Matrix. - Explain why converting to Opus format is required and outline recommended audio workflow. - Provide usage examples, tips for selecting voices and mapping languages, and workspace directory suggestions. - Discuss current OpenClaw integration limitations and propose potential enhancements. - Include TODOs for future language detection, integration, and compatibility testing.
元数据
Slug macos-say
版本 0.0.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! 是什么?

Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 76 次。

如何安装 Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install macos-say」即可一键安装,无需额外配置。

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! 是免费的吗?

是的,Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! 支持哪些平台?

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!?

由 zviratko(@zviratko)开发并维护,当前版本 v0.0.2。

💬 留言讨论