Description

Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.

README (SKILL.md)

macOS Local Voice

Name: macOS Local Voice
Author: strrl

Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.

Requirements

macOS (Apple Silicon recommended, Intel works too)
yap CLI in PATH — install via brew install finnvoor/tools/yap
ffmpeg in PATH (optional, needed for ogg/opus output) — brew install ffmpeg
say and osascript are macOS built-in

Speech-to-Text (STT)

Transcribe an audio file to text using Apple's on-device speech recognition.

node {baseDir}/scripts/stt.mjs \x3Caudio_file> [locale]

audio_file: path to audio (ogg, m4a, mp3, wav, etc.)
locale: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.
Outputs transcribed text to stdout.

Supported STT locales

Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.

Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.

Language detection tips

If the user's recent messages are in Chinese → use zh_CN
If in English → use en_US
If mixed or unclear → try without locale (system default)

Text-to-Speech (TTS)

Convert text to an audio file using macOS native TTS.

node {baseDir}/scripts/tts.mjs "\x3Ctext>" [voice_name] [output_path]

text: the text to speak
voice_name: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.
output_path: optional, defaults to a timestamped file in ~/.openclaw/media/outbound/
Outputs the generated audio file path to stdout.
If ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.

Sending as voice note

After generating the audio file, send it using the message tool:

message action=send media=\x3Cpath_from_tts.sh> asVoice=true

Voice Management

List available voices, check readiness, or find the best voice for a language:

node {baseDir}/scripts/voices.mjs list [locale]     # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check "\x3Cname>"     # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best \x3Clocale>       # Get the highest quality voice for a locale

Quality levels

1 = compact (low quality, always available)
2 = enhanced (mid quality, may need download)
3 = premium (highest quality, needs download from System Settings)

If a voice is not available

Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."

Notes

The say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.
Premium voices (e.g. Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.
Siri voices are not accessible via the speech synthesis API.

Usage Guidance

This skill appears coherent and local-only, but review these before installing: 1) Install yap and ffmpeg from trusted package sources (Homebrew/taps shown in README). 2) The scripts invoke local system commands (yap, say, osascript, ffmpeg) — ensure you are comfortable granting the skill the ability to execute those on your Mac. 3) Generated audio is saved under ~/.openclaw/media/outbound and the SKILL suggests using the agent 'message' tool to send it — verify recipients before sending sensitive audio. 4) The locale detection and voice selection are heuristic (CJK-character ratio, calling voices.mjs) — test with your language. If you need further assurance, inspect the included scripts (they are readable JS) and confirm no network calls are added in your environment.

Capability Analysis

Type: OpenClaw Skill Name: macos-local-voice Version: 1.0.0 The skill is classified as suspicious due to a potential arbitrary file write vulnerability in `scripts/tts.mjs`. The `output_path` argument, if provided by the user or agent, is used directly to construct the output file path. This could allow an attacker to specify a path traversal sequence (e.g., `../../../malicious.aiff`), leading to the creation of audio files in unintended file system locations. While the default output path is safe and there is no evidence of intentional malicious behavior, this represents a significant vulnerability.

Capability Assessment

✓ Purpose & Capability

Name/description (local macOS STT/TTS) match the actual requirements and behavior: required binaries are yap, say, and osascript (all relevant), code uses Apple Speech.framework via yap and AVFoundation via osascript, and ffmpeg is optional for format conversion.

✓ Instruction Scope

SKILL.md directs the agent to run the included scripts and to use local system settings for downloading voices. The instructions do not ask the agent to read unrelated files, export environment secrets, or contact remote endpoints. It does reference the agent 'message' tool for sending generated audio (expected for delivering voice notes).

✓ Install Mechanism

This is instruction-only with included scripts (no install spec). The README suggests installing yap/ffmpeg via Homebrew — a standard, traceable source. No downloads from arbitrary URLs or archive extraction are present.

✓ Credentials

No environment variables or credentials are requested. The scripts use HOME to write output under ~/.openclaw/media/outbound (reasonable for generated media). There are no requests for unrelated secrets or config paths.

✓ Persistence & Privilege

Skill is not forced always-on and does not modify other skills or system-wide settings. It only creates per-user media files in a reasonable path and requires user action to download premium voices.

Version History

v1.0.0

Initial release: Node.js rewrite. STT (yap) + TTS (say) + voice detection (JXA/AVFoundation). Fully offline, no API keys.

Metadata

Slug macos-local-voice

Version 1.0.0

License —

All-time Installs 12

Active Installs 12

Total Versions 1

Frequently Asked Questions

What is macOS Local Voice?

Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection. It is an AI Agent Skill for Claude Code / OpenClaw, with 1927 downloads so far.

How do I install macOS Local Voice?

Run "/install macos-local-voice" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is macOS Local Voice free?

Yes, macOS Local Voice is completely free (open-source). You can download, install and use it at no cost.

Which platforms does macOS Local Voice support?

macOS Local Voice is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created macOS Local Voice?

It is built and maintained by STRRL (@strrl); the current version is v1.0.0.

More Skills

macOS Local Voice

macOS Local Voice

Requirements

Speech-to-Text (STT)

Supported STT locales

Language detection tips

Text-to-Speech (TTS)

Sending as voice note

Voice Management

Quality levels

If a voice is not available

Notes

What is macOS Local Voice?

How do I install macOS Local Voice?

Is macOS Local Voice free?

Which platforms does macOS Local Voice support?

Who created macOS Local Voice?

💬 Comments