← Back to Skills Marketplace

Local STT (Nvidia Parakeet + Whisper Support)

Name: Local STT (Nvidia Parakeet + Whisper Support)
Author: araa47

by araa47 · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

2704

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install local-stt

Description

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

README (SKILL.md)

Local STT (Parakeet / Whisper)

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:

Parakeet (default): Best accuracy for English, correctly captures names and filler words
Whisper: Fastest inference, supports 99 languages

Usage

# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg

# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3

# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet

Options

-b/--backend: parakeet (default), whisper
-m/--model: Model variant (see below)
--no-int8: Disable int8 quantization
-q/--quiet: Suppress progress
--room-id: Matrix room ID for direct message

Models

Parakeet (default backend)

Model	Description
v2 (default)	English only, best accuracy
v3	Multilingual

Whisper

Model	Description
tiny	Fastest, lower accuracy
base (default)	Good balance
small	Better accuracy
large-v3-turbo	Best quality, slower

Benchmark (24s audio)

Backend/Model	Time	RTF	Notes
Whisper Base int8	0.43s	0.018x	Fastest
Parakeet v2 int8	0.60s	0.025x	Best accuracy
Parakeet v3 int8	0.63s	0.026x	Multilingual

openclaw.json

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
            "args": ["--quiet", "{{MediaPath}}"],
            "timeoutSeconds": 30
          }
        ]
      }
    }
  }
}

Usage Guidance

This skill appears to be a legitimate local STT tool, but you should be cautious before installing or using it as-is: - The script will automatically load ~/.openclaw/.env and ~/.env and may pick up sensitive environment variables. Review the contents of those files first or move secrets elsewhere. - If you use --room-id (Matrix integration), the script will look for MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN and will send the transcript to the specified homeserver; provide a minimally-privileged token or avoid the feature if you don't trust the destination. - The tool uses onnx_asr/huggingface components to load models at runtime; expect network downloads of model weights (possibly large) from external hosts. If you require offline-only operation, ensure required models are pre-provisioned and verify the code's model-loading behavior. - The script writes a local log (/tmp/stt_matrix.log) containing attempt metadata (URLs and HTTP status codes). Inspect this file for unexpected behavior. Recommended actions: ask the skill author to update registry metadata to declare required env vars (MATRIX_HOMESERVER, MATRIX_ACCESS_TOKEN) and to explicitly document network/model downloads; or run the skill in an isolated environment (container or VM) with only the minimal credentials you are willing to expose.

Capability Analysis

Type: OpenClaw Skill Name: local-stt Version: 1.0.0 The skill is designed for local speech-to-text and includes an optional feature to send transcriptions to a Matrix room. This involves reading `MATRIX_HOMESERVER` and `MATRIX_ACCESS_TOKEN` from environment variables (potentially from `~/.openclaw/.env` or `~/.env`) and making an outbound network request to a Matrix homeserver. This behavior, including the use of `ffmpeg` for audio conversion, is explicitly documented in `SKILL.md` and the `scripts/local-stt.py` docstring, and is aligned with the skill's stated purpose. There is no evidence of intentional harmful behavior, such as exfiltrating unrelated sensitive data, establishing persistence, or malicious prompt injection.

Capability Assessment

ℹ Purpose & Capability

The code and SKILL.md align with a local STT tool (ffmpeg conversion, ONNX-based Parakeet/Whisper backends). The ability to post transcriptions to a Matrix room matches the documented --room-id option. However, the registry metadata listed no required environment variables while the script clearly expects MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN when the Matrix feature is used; that mismatch is noteworthy.

⚠ Instruction Scope

SKILL.md documents the --room-id option but does not mention that the runtime will: (1) attempt to load environment files from ~/.openclaw/.env and ~/.env, (2) read MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN from the environment, (3) write logs to /tmp/stt_matrix.log, and (4) load models via onnx_asr which typically pulls model files from network sources (e.g., huggingface). Reading a user's ~/.env is scope-creep because it can surface unrelated secrets; automatic model downloads are network activity not called out in metadata.

✓ Install Mechanism

There is no install spec (instruction-only), which minimizes installer risk. The script includes a commented dependency list and a nonstandard shebang ('uv run --script') indicating runtime packages will be required; this implies runtime package installation/network activity but no explicit installer URL or archive is used.

⚠ Credentials

The skill requests no environment variables in registry metadata, yet the script loads ~/.openclaw/.env and ~/.env and reads MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN if present. Automatically loading a user's .env and using tokens is disproportionate unless clearly documented; it increases the chance of accidental use of unrelated secrets. The Matrix access token, if present, will be used to transmit transcriptions to the specified homeserver.

✓ Persistence & Privilege

The skill is not always-enabled and does not request elevated platform privileges. It writes a local log file (/tmp/stt_matrix.log) and temporarily writes a converted WAV file before deleting it, which is reasonable for this CLI. It does not modify other skills or agent-wide configuration.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install local-stt
After installation, invoke the skill by name or use /local-stt
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of unified local speech-to-text with ONNX Runtime and int8 quantization. - Supports selectable backends: Parakeet (default, best English accuracy) and Whisper (fastest, multilingual). - Easily switch backends and models via command-line options. - Includes benchmarking data for model speed and accuracy. - Requires ffmpeg for operation.

Metadata

Slug local-stt

Version 1.0.0

License —

All-time Installs 20

Active Installs 19

Total Versions 1

Frequently Asked Questions

What is Local STT (Nvidia Parakeet + Whisper Support)?

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual). It is an AI Agent Skill for Claude Code / OpenClaw, with 2704 downloads so far.

How do I install Local STT (Nvidia Parakeet + Whisper Support)?

Run "/install local-stt" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Local STT (Nvidia Parakeet + Whisper Support) free?

Yes, Local STT (Nvidia Parakeet + Whisper Support) is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Local STT (Nvidia Parakeet + Whisper Support) support?

Local STT (Nvidia Parakeet + Whisper Support) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Local STT (Nvidia Parakeet + Whisper Support)?

It is built and maintained by araa47 (@araa47); the current version is v1.0.0.

More Skills