功能描述

Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.

使用说明 (SKILL.md)

macOS Local Voice

Name: macOS Local Voice
Author: strrl

Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.

Requirements

macOS (Apple Silicon recommended, Intel works too)
yap CLI in PATH — install via brew install finnvoor/tools/yap
ffmpeg in PATH (optional, needed for ogg/opus output) — brew install ffmpeg
say and osascript are macOS built-in

Speech-to-Text (STT)

Transcribe an audio file to text using Apple's on-device speech recognition.

node {baseDir}/scripts/stt.mjs \x3Caudio_file> [locale]

audio_file: path to audio (ogg, m4a, mp3, wav, etc.)
locale: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.
Outputs transcribed text to stdout.

Supported STT locales

Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.

Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.

Language detection tips

If the user's recent messages are in Chinese → use zh_CN
If in English → use en_US
If mixed or unclear → try without locale (system default)

Text-to-Speech (TTS)

Convert text to an audio file using macOS native TTS.

node {baseDir}/scripts/tts.mjs "\x3Ctext>" [voice_name] [output_path]

text: the text to speak
voice_name: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.
output_path: optional, defaults to a timestamped file in ~/.openclaw/media/outbound/
Outputs the generated audio file path to stdout.
If ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.

Sending as voice note

After generating the audio file, send it using the message tool:

message action=send media=\x3Cpath_from_tts.sh> asVoice=true

Voice Management

List available voices, check readiness, or find the best voice for a language:

node {baseDir}/scripts/voices.mjs list [locale]     # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check "\x3Cname>"     # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best \x3Clocale>       # Get the highest quality voice for a locale

Quality levels

1 = compact (low quality, always available)
2 = enhanced (mid quality, may need download)
3 = premium (highest quality, needs download from System Settings)

If a voice is not available

Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."

Notes

The say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.
Premium voices (e.g. Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.
Siri voices are not accessible via the speech synthesis API.

安全使用建议

This skill appears coherent and local-only, but review these before installing: 1) Install yap and ffmpeg from trusted package sources (Homebrew/taps shown in README). 2) The scripts invoke local system commands (yap, say, osascript, ffmpeg) — ensure you are comfortable granting the skill the ability to execute those on your Mac. 3) Generated audio is saved under ~/.openclaw/media/outbound and the SKILL suggests using the agent 'message' tool to send it — verify recipients before sending sensitive audio. 4) The locale detection and voice selection are heuristic (CJK-character ratio, calling voices.mjs) — test with your language. If you need further assurance, inspect the included scripts (they are readable JS) and confirm no network calls are added in your environment.

功能分析

Type: OpenClaw Skill Name: macos-local-voice Version: 1.0.0 The skill is classified as suspicious due to a potential arbitrary file write vulnerability in `scripts/tts.mjs`. The `output_path` argument, if provided by the user or agent, is used directly to construct the output file path. This could allow an attacker to specify a path traversal sequence (e.g., `../../../malicious.aiff`), leading to the creation of audio files in unintended file system locations. While the default output path is safe and there is no evidence of intentional malicious behavior, this represents a significant vulnerability.

能力评估

✓ Purpose & Capability

Name/description (local macOS STT/TTS) match the actual requirements and behavior: required binaries are yap, say, and osascript (all relevant), code uses Apple Speech.framework via yap and AVFoundation via osascript, and ffmpeg is optional for format conversion.

✓ Instruction Scope

SKILL.md directs the agent to run the included scripts and to use local system settings for downloading voices. The instructions do not ask the agent to read unrelated files, export environment secrets, or contact remote endpoints. It does reference the agent 'message' tool for sending generated audio (expected for delivering voice notes).

✓ Install Mechanism

This is instruction-only with included scripts (no install spec). The README suggests installing yap/ffmpeg via Homebrew — a standard, traceable source. No downloads from arbitrary URLs or archive extraction are present.

✓ Credentials

No environment variables or credentials are requested. The scripts use HOME to write output under ~/.openclaw/media/outbound (reasonable for generated media). There are no requests for unrelated secrets or config paths.

✓ Persistence & Privilege

Skill is not forced always-on and does not modify other skills or system-wide settings. It only creates per-user media files in a reasonable path and requires user action to download premium voices.

版本历史

v1.0.0

Initial release: Node.js rewrite. STT (yap) + TTS (say) + voice detection (JXA/AVFoundation). Fully offline, no API keys.

元数据

Slug macos-local-voice

版本 1.0.0

许可证 —

累计安装 12

当前安装数 12

历史版本数 1

常见问题