← 返回 Skills 市场

Local STT (Nvidia Parakeet + Whisper Support)

Name: Local STT (Nvidia Parakeet + Whisper Support)
Author: araa47

作者 araa47 · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

2704

总下载

当前安装

版本数

在 OpenClaw 中安装

/install local-stt

功能描述

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

使用说明 (SKILL.md)

Local STT (Parakeet / Whisper)

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:

Parakeet (default): Best accuracy for English, correctly captures names and filler words
Whisper: Fastest inference, supports 99 languages

Usage

# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg

# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3

# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet

Options

-b/--backend: parakeet (default), whisper
-m/--model: Model variant (see below)
--no-int8: Disable int8 quantization
-q/--quiet: Suppress progress
--room-id: Matrix room ID for direct message

Models

Parakeet (default backend)

Model	Description
v2 (default)	English only, best accuracy
v3	Multilingual

Whisper

Model	Description
tiny	Fastest, lower accuracy
base (default)	Good balance
small	Better accuracy
large-v3-turbo	Best quality, slower

Benchmark (24s audio)

Backend/Model	Time	RTF	Notes
Whisper Base int8	0.43s	0.018x	Fastest
Parakeet v2 int8	0.60s	0.025x	Best accuracy
Parakeet v3 int8	0.63s	0.026x	Multilingual

openclaw.json

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
            "args": ["--quiet", "{{MediaPath}}"],
            "timeoutSeconds": 30
          }
        ]
      }
    }
  }
}

安全使用建议

This skill appears to be a legitimate local STT tool, but you should be cautious before installing or using it as-is: - The script will automatically load ~/.openclaw/.env and ~/.env and may pick up sensitive environment variables. Review the contents of those files first or move secrets elsewhere. - If you use --room-id (Matrix integration), the script will look for MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN and will send the transcript to the specified homeserver; provide a minimally-privileged token or avoid the feature if you don't trust the destination. - The tool uses onnx_asr/huggingface components to load models at runtime; expect network downloads of model weights (possibly large) from external hosts. If you require offline-only operation, ensure required models are pre-provisioned and verify the code's model-loading behavior. - The script writes a local log (/tmp/stt_matrix.log) containing attempt metadata (URLs and HTTP status codes). Inspect this file for unexpected behavior. Recommended actions: ask the skill author to update registry metadata to declare required env vars (MATRIX_HOMESERVER, MATRIX_ACCESS_TOKEN) and to explicitly document network/model downloads; or run the skill in an isolated environment (container or VM) with only the minimal credentials you are willing to expose.

功能分析

Type: OpenClaw Skill Name: local-stt Version: 1.0.0 The skill is designed for local speech-to-text and includes an optional feature to send transcriptions to a Matrix room. This involves reading `MATRIX_HOMESERVER` and `MATRIX_ACCESS_TOKEN` from environment variables (potentially from `~/.openclaw/.env` or `~/.env`) and making an outbound network request to a Matrix homeserver. This behavior, including the use of `ffmpeg` for audio conversion, is explicitly documented in `SKILL.md` and the `scripts/local-stt.py` docstring, and is aligned with the skill's stated purpose. There is no evidence of intentional harmful behavior, such as exfiltrating unrelated sensitive data, establishing persistence, or malicious prompt injection.

能力评估

ℹ Purpose & Capability

The code and SKILL.md align with a local STT tool (ffmpeg conversion, ONNX-based Parakeet/Whisper backends). The ability to post transcriptions to a Matrix room matches the documented --room-id option. However, the registry metadata listed no required environment variables while the script clearly expects MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN when the Matrix feature is used; that mismatch is noteworthy.

⚠ Instruction Scope

SKILL.md documents the --room-id option but does not mention that the runtime will: (1) attempt to load environment files from ~/.openclaw/.env and ~/.env, (2) read MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN from the environment, (3) write logs to /tmp/stt_matrix.log, and (4) load models via onnx_asr which typically pulls model files from network sources (e.g., huggingface). Reading a user's ~/.env is scope-creep because it can surface unrelated secrets; automatic model downloads are network activity not called out in metadata.

✓ Install Mechanism

There is no install spec (instruction-only), which minimizes installer risk. The script includes a commented dependency list and a nonstandard shebang ('uv run --script') indicating runtime packages will be required; this implies runtime package installation/network activity but no explicit installer URL or archive is used.

⚠ Credentials

The skill requests no environment variables in registry metadata, yet the script loads ~/.openclaw/.env and ~/.env and reads MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN if present. Automatically loading a user's .env and using tokens is disproportionate unless clearly documented; it increases the chance of accidental use of unrelated secrets. The Matrix access token, if present, will be used to transmit transcriptions to the specified homeserver.

✓ Persistence & Privilege

The skill is not always-enabled and does not request elevated platform privileges. It writes a local log file (/tmp/stt_matrix.log) and temporarily writes a converted WAV file before deleting it, which is reasonable for this CLI. It does not modify other skills or agent-wide configuration.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install local-stt
安装完成后，直接呼叫该 Skill 的名称或使用 /local-stt 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of unified local speech-to-text with ONNX Runtime and int8 quantization. - Supports selectable backends: Parakeet (default, best English accuracy) and Whisper (fastest, multilingual). - Easily switch backends and models via command-line options. - Includes benchmarking data for model speed and accuracy. - Requires ffmpeg for operation.

元数据

Slug local-stt

版本 1.0.0

许可证 —

累计安装 20

当前安装数 19

历史版本数 1

常见问题