← 返回 Skills 市场
araa47

Local STT (Nvidia Parakeet + Whisper Support)

作者 araa47 · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
2704
总下载
1
收藏
19
当前安装
1
版本数
在 OpenClaw 中安装
/install local-stt
功能描述
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
使用说明 (SKILL.md)

Local STT (Parakeet / Whisper)

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:

  • Parakeet (default): Best accuracy for English, correctly captures names and filler words
  • Whisper: Fastest inference, supports 99 languages

Usage

# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg

# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3

# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet

Options

  • -b/--backend: parakeet (default), whisper
  • -m/--model: Model variant (see below)
  • --no-int8: Disable int8 quantization
  • -q/--quiet: Suppress progress
  • --room-id: Matrix room ID for direct message

Models

Parakeet (default backend)

Model Description
v2 (default) English only, best accuracy
v3 Multilingual

Whisper

Model Description
tiny Fastest, lower accuracy
base (default) Good balance
small Better accuracy
large-v3-turbo Best quality, slower

Benchmark (24s audio)

Backend/Model Time RTF Notes
Whisper Base int8 0.43s 0.018x Fastest
Parakeet v2 int8 0.60s 0.025x Best accuracy
Parakeet v3 int8 0.63s 0.026x Multilingual

openclaw.json

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
            "args": ["--quiet", "{{MediaPath}}"],
            "timeoutSeconds": 30
          }
        ]
      }
    }
  }
}
安全使用建议
This skill appears to be a legitimate local STT tool, but you should be cautious before installing or using it as-is: - The script will automatically load ~/.openclaw/.env and ~/.env and may pick up sensitive environment variables. Review the contents of those files first or move secrets elsewhere. - If you use --room-id (Matrix integration), the script will look for MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN and will send the transcript to the specified homeserver; provide a minimally-privileged token or avoid the feature if you don't trust the destination. - The tool uses onnx_asr/huggingface components to load models at runtime; expect network downloads of model weights (possibly large) from external hosts. If you require offline-only operation, ensure required models are pre-provisioned and verify the code's model-loading behavior. - The script writes a local log (/tmp/stt_matrix.log) containing attempt metadata (URLs and HTTP status codes). Inspect this file for unexpected behavior. Recommended actions: ask the skill author to update registry metadata to declare required env vars (MATRIX_HOMESERVER, MATRIX_ACCESS_TOKEN) and to explicitly document network/model downloads; or run the skill in an isolated environment (container or VM) with only the minimal credentials you are willing to expose.
功能分析
Type: OpenClaw Skill Name: local-stt Version: 1.0.0 The skill is designed for local speech-to-text and includes an optional feature to send transcriptions to a Matrix room. This involves reading `MATRIX_HOMESERVER` and `MATRIX_ACCESS_TOKEN` from environment variables (potentially from `~/.openclaw/.env` or `~/.env`) and making an outbound network request to a Matrix homeserver. This behavior, including the use of `ffmpeg` for audio conversion, is explicitly documented in `SKILL.md` and the `scripts/local-stt.py` docstring, and is aligned with the skill's stated purpose. There is no evidence of intentional harmful behavior, such as exfiltrating unrelated sensitive data, establishing persistence, or malicious prompt injection.
能力评估
Purpose & Capability
The code and SKILL.md align with a local STT tool (ffmpeg conversion, ONNX-based Parakeet/Whisper backends). The ability to post transcriptions to a Matrix room matches the documented --room-id option. However, the registry metadata listed no required environment variables while the script clearly expects MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN when the Matrix feature is used; that mismatch is noteworthy.
Instruction Scope
SKILL.md documents the --room-id option but does not mention that the runtime will: (1) attempt to load environment files from ~/.openclaw/.env and ~/.env, (2) read MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN from the environment, (3) write logs to /tmp/stt_matrix.log, and (4) load models via onnx_asr which typically pulls model files from network sources (e.g., huggingface). Reading a user's ~/.env is scope-creep because it can surface unrelated secrets; automatic model downloads are network activity not called out in metadata.
Install Mechanism
There is no install spec (instruction-only), which minimizes installer risk. The script includes a commented dependency list and a nonstandard shebang ('uv run --script') indicating runtime packages will be required; this implies runtime package installation/network activity but no explicit installer URL or archive is used.
Credentials
The skill requests no environment variables in registry metadata, yet the script loads ~/.openclaw/.env and ~/.env and reads MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN if present. Automatically loading a user's .env and using tokens is disproportionate unless clearly documented; it increases the chance of accidental use of unrelated secrets. The Matrix access token, if present, will be used to transmit transcriptions to the specified homeserver.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It writes a local log file (/tmp/stt_matrix.log) and temporarily writes a converted WAV file before deleting it, which is reasonable for this CLI. It does not modify other skills or agent-wide configuration.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install local-stt
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /local-stt 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of unified local speech-to-text with ONNX Runtime and int8 quantization. - Supports selectable backends: Parakeet (default, best English accuracy) and Whisper (fastest, multilingual). - Easily switch backends and models via command-line options. - Includes benchmarking data for model speed and accuracy. - Requires ffmpeg for operation.
元数据
Slug local-stt
版本 1.0.0
许可证
累计安装 20
当前安装数 19
历史版本数 1
常见问题

Local STT (Nvidia Parakeet + Whisper Support) 是什么?

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 2704 次。

如何安装 Local STT (Nvidia Parakeet + Whisper Support)?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-stt」即可一键安装,无需额外配置。

Local STT (Nvidia Parakeet + Whisper Support) 是免费的吗?

是的,Local STT (Nvidia Parakeet + Whisper Support) 完全免费(开源免费),可自由下载、安装和使用。

Local STT (Nvidia Parakeet + Whisper Support) 支持哪些平台?

Local STT (Nvidia Parakeet + Whisper Support) 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Local STT (Nvidia Parakeet + Whisper Support)?

由 araa47(@araa47)开发并维护,当前版本 v1.0.0。

💬 留言讨论