← Back to Skills Marketplace
strrl

macOS Local Voice

by STRRL · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
1927
Downloads
0
Stars
12
Active Installs
1
Versions
Install in OpenClaw
/install macos-local-voice
Description
Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.
README (SKILL.md)

macOS Local Voice

Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.

Requirements

  • macOS (Apple Silicon recommended, Intel works too)
  • yap CLI in PATH — install via brew install finnvoor/tools/yap
  • ffmpeg in PATH (optional, needed for ogg/opus output) — brew install ffmpeg
  • say and osascript are macOS built-in

Speech-to-Text (STT)

Transcribe an audio file to text using Apple's on-device speech recognition.

node {baseDir}/scripts/stt.mjs \x3Caudio_file> [locale]
  • audio_file: path to audio (ogg, m4a, mp3, wav, etc.)
  • locale: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.
  • Outputs transcribed text to stdout.

Supported STT locales

Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.

Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.

Language detection tips

  • If the user's recent messages are in Chinese → use zh_CN
  • If in English → use en_US
  • If mixed or unclear → try without locale (system default)

Text-to-Speech (TTS)

Convert text to an audio file using macOS native TTS.

node {baseDir}/scripts/tts.mjs "\x3Ctext>" [voice_name] [output_path]
  • text: the text to speak
  • voice_name: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.
  • output_path: optional, defaults to a timestamped file in ~/.openclaw/media/outbound/
  • Outputs the generated audio file path to stdout.
  • If ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.

Sending as voice note

After generating the audio file, send it using the message tool:

message action=send media=\x3Cpath_from_tts.sh> asVoice=true

Voice Management

List available voices, check readiness, or find the best voice for a language:

node {baseDir}/scripts/voices.mjs list [locale]     # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check "\x3Cname>"     # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best \x3Clocale>       # Get the highest quality voice for a locale

Quality levels

  • 1 = compact (low quality, always available)
  • 2 = enhanced (mid quality, may need download)
  • 3 = premium (highest quality, needs download from System Settings)

If a voice is not available

Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."

Notes

  • The say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.
  • Premium voices (e.g. Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.
  • Siri voices are not accessible via the speech synthesis API.
Usage Guidance
This skill appears coherent and local-only, but review these before installing: 1) Install yap and ffmpeg from trusted package sources (Homebrew/taps shown in README). 2) The scripts invoke local system commands (yap, say, osascript, ffmpeg) — ensure you are comfortable granting the skill the ability to execute those on your Mac. 3) Generated audio is saved under ~/.openclaw/media/outbound and the SKILL suggests using the agent 'message' tool to send it — verify recipients before sending sensitive audio. 4) The locale detection and voice selection are heuristic (CJK-character ratio, calling voices.mjs) — test with your language. If you need further assurance, inspect the included scripts (they are readable JS) and confirm no network calls are added in your environment.
Capability Analysis
Type: OpenClaw Skill Name: macos-local-voice Version: 1.0.0 The skill is classified as suspicious due to a potential arbitrary file write vulnerability in `scripts/tts.mjs`. The `output_path` argument, if provided by the user or agent, is used directly to construct the output file path. This could allow an attacker to specify a path traversal sequence (e.g., `../../../malicious.aiff`), leading to the creation of audio files in unintended file system locations. While the default output path is safe and there is no evidence of intentional malicious behavior, this represents a significant vulnerability.
Capability Assessment
Purpose & Capability
Name/description (local macOS STT/TTS) match the actual requirements and behavior: required binaries are yap, say, and osascript (all relevant), code uses Apple Speech.framework via yap and AVFoundation via osascript, and ffmpeg is optional for format conversion.
Instruction Scope
SKILL.md directs the agent to run the included scripts and to use local system settings for downloading voices. The instructions do not ask the agent to read unrelated files, export environment secrets, or contact remote endpoints. It does reference the agent 'message' tool for sending generated audio (expected for delivering voice notes).
Install Mechanism
This is instruction-only with included scripts (no install spec). The README suggests installing yap/ffmpeg via Homebrew — a standard, traceable source. No downloads from arbitrary URLs or archive extraction are present.
Credentials
No environment variables or credentials are requested. The scripts use HOME to write output under ~/.openclaw/media/outbound (reasonable for generated media). There are no requests for unrelated secrets or config paths.
Persistence & Privilege
Skill is not forced always-on and does not modify other skills or system-wide settings. It only creates per-user media files in a reasonable path and requires user action to download premium voices.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install macos-local-voice
  3. After installation, invoke the skill by name or use /macos-local-voice
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: Node.js rewrite. STT (yap) + TTS (say) + voice detection (JXA/AVFoundation). Fully offline, no API keys.
Metadata
Slug macos-local-voice
Version 1.0.0
License
All-time Installs 12
Active Installs 12
Total Versions 1
Frequently Asked Questions

What is macOS Local Voice?

Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection. It is an AI Agent Skill for Claude Code / OpenClaw, with 1927 downloads so far.

How do I install macOS Local Voice?

Run "/install macos-local-voice" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is macOS Local Voice free?

Yes, macOS Local Voice is completely free (open-source). You can download, install and use it at no cost.

Which platforms does macOS Local Voice support?

macOS Local Voice is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created macOS Local Voice?

It is built and maintained by STRRL (@strrl); the current version is v1.0.0.

💬 Comments