← 返回 Skills 市场
strrl

macOS Local Voice

作者 STRRL · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
1927
总下载
0
收藏
12
当前安装
1
版本数
在 OpenClaw 中安装
/install macos-local-voice
功能描述
Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.
使用说明 (SKILL.md)

macOS Local Voice

Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.

Requirements

  • macOS (Apple Silicon recommended, Intel works too)
  • yap CLI in PATH — install via brew install finnvoor/tools/yap
  • ffmpeg in PATH (optional, needed for ogg/opus output) — brew install ffmpeg
  • say and osascript are macOS built-in

Speech-to-Text (STT)

Transcribe an audio file to text using Apple's on-device speech recognition.

node {baseDir}/scripts/stt.mjs \x3Caudio_file> [locale]
  • audio_file: path to audio (ogg, m4a, mp3, wav, etc.)
  • locale: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.
  • Outputs transcribed text to stdout.

Supported STT locales

Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.

Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.

Language detection tips

  • If the user's recent messages are in Chinese → use zh_CN
  • If in English → use en_US
  • If mixed or unclear → try without locale (system default)

Text-to-Speech (TTS)

Convert text to an audio file using macOS native TTS.

node {baseDir}/scripts/tts.mjs "\x3Ctext>" [voice_name] [output_path]
  • text: the text to speak
  • voice_name: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.
  • output_path: optional, defaults to a timestamped file in ~/.openclaw/media/outbound/
  • Outputs the generated audio file path to stdout.
  • If ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.

Sending as voice note

After generating the audio file, send it using the message tool:

message action=send media=\x3Cpath_from_tts.sh> asVoice=true

Voice Management

List available voices, check readiness, or find the best voice for a language:

node {baseDir}/scripts/voices.mjs list [locale]     # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check "\x3Cname>"     # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best \x3Clocale>       # Get the highest quality voice for a locale

Quality levels

  • 1 = compact (low quality, always available)
  • 2 = enhanced (mid quality, may need download)
  • 3 = premium (highest quality, needs download from System Settings)

If a voice is not available

Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."

Notes

  • The say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.
  • Premium voices (e.g. Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.
  • Siri voices are not accessible via the speech synthesis API.
安全使用建议
This skill appears coherent and local-only, but review these before installing: 1) Install yap and ffmpeg from trusted package sources (Homebrew/taps shown in README). 2) The scripts invoke local system commands (yap, say, osascript, ffmpeg) — ensure you are comfortable granting the skill the ability to execute those on your Mac. 3) Generated audio is saved under ~/.openclaw/media/outbound and the SKILL suggests using the agent 'message' tool to send it — verify recipients before sending sensitive audio. 4) The locale detection and voice selection are heuristic (CJK-character ratio, calling voices.mjs) — test with your language. If you need further assurance, inspect the included scripts (they are readable JS) and confirm no network calls are added in your environment.
功能分析
Type: OpenClaw Skill Name: macos-local-voice Version: 1.0.0 The skill is classified as suspicious due to a potential arbitrary file write vulnerability in `scripts/tts.mjs`. The `output_path` argument, if provided by the user or agent, is used directly to construct the output file path. This could allow an attacker to specify a path traversal sequence (e.g., `../../../malicious.aiff`), leading to the creation of audio files in unintended file system locations. While the default output path is safe and there is no evidence of intentional malicious behavior, this represents a significant vulnerability.
能力评估
Purpose & Capability
Name/description (local macOS STT/TTS) match the actual requirements and behavior: required binaries are yap, say, and osascript (all relevant), code uses Apple Speech.framework via yap and AVFoundation via osascript, and ffmpeg is optional for format conversion.
Instruction Scope
SKILL.md directs the agent to run the included scripts and to use local system settings for downloading voices. The instructions do not ask the agent to read unrelated files, export environment secrets, or contact remote endpoints. It does reference the agent 'message' tool for sending generated audio (expected for delivering voice notes).
Install Mechanism
This is instruction-only with included scripts (no install spec). The README suggests installing yap/ffmpeg via Homebrew — a standard, traceable source. No downloads from arbitrary URLs or archive extraction are present.
Credentials
No environment variables or credentials are requested. The scripts use HOME to write output under ~/.openclaw/media/outbound (reasonable for generated media). There are no requests for unrelated secrets or config paths.
Persistence & Privilege
Skill is not forced always-on and does not modify other skills or system-wide settings. It only creates per-user media files in a reasonable path and requires user action to download premium voices.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install macos-local-voice
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /macos-local-voice 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: Node.js rewrite. STT (yap) + TTS (say) + voice detection (JXA/AVFoundation). Fully offline, no API keys.
元数据
Slug macos-local-voice
版本 1.0.0
许可证
累计安装 12
当前安装数 12
历史版本数 1
常见问题

macOS Local Voice 是什么?

Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1927 次。

如何安装 macOS Local Voice?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install macos-local-voice」即可一键安装,无需额外配置。

macOS Local Voice 是免费的吗?

是的,macOS Local Voice 完全免费(开源免费),可自由下载、安装和使用。

macOS Local Voice 支持哪些平台?

macOS Local Voice 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 macOS Local Voice?

由 STRRL(@strrl)开发并维护,当前版本 v1.0.0。

💬 留言讨论