← Back to Skills Marketplace
nikil511

Jetson CUDA Voice Pipeline

linux ⚠ suspicious
579
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install jetson-cuda-voice
Description
Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe...
README (SKILL.md)

Jetson CUDA Voice Pipeline

Fully offline, GPU-accelerated local voice assistant for NVIDIA Jetson devices. No cloud for STT or TTS — only the LLM call uses the internet (OpenRouter or any OpenAI-compatible endpoint).

Architecture

ReSpeaker mic (hw:Array,0, S24_3LE, 16kHz)
    ↓ arecord raw stream — never restarted mid-conversation
openWakeWord — "Hey Jarvis" detection (~32ms chunks)
    ↓ wake word triggered → two-tone beep
_measure_ambient() — 480ms median RMS → dynamic VAD thresholds
    ↓
transcribe_stream() — VAD + whisper.cpp CUDA HTTP (~2-4s per utterance)
    ↓
ask_llm() — OpenRouter or local OpenAI-compatible API (~1-2s)
    ↓
Piper TTS — offline neural TTS, hot-loaded at startup → aplay
    ↓
ReSpeaker LEDs: 🔵 blue=listening  🩵 cyan=thinking  ⚫ off=done  🔴 red=error

Total latency: ~5-8 seconds from wake word to first spoken word.

Key Features

  • Zero mic-restart gap — same arecord pipe feeds wake word detection and STT
  • Dynamic ambient calibration — measures room noise floor on every wake word trigger (adapts to fans, AC, time of day)
  • Conversation history — 20-turn rolling context for natural follow-ups
  • Auto language detection — whisper -l auto, works multilingual
  • ReSpeaker LED ring — visual state feedback (silent no-op if device not present)
  • Fully configurable — all paths and thresholds via environment variables

Hardware Requirements

Component Tested Notes
Jetson Xavier NX ARM64, sm_72, 8GB, JetPack 5.1.4
ReSpeaker USB Mic Array v1.0 2886:0007, S24_3LE, 16kHz
Any ALSA speaker tested with Creative MUVO 2c
Other Jetson models change CMAKE_CUDA_ARCHITECTURES

Quick Start

# 1. Install Python deps
pip install openwakeword piper-tts numpy requests pyusb

# 2. Build whisper.cpp with CUDA (see BUILD.md — ~45 min, one-time)
#    Then place binary at ~/.local/bin/whisper-server-gpu

# 3. Download Piper voice model
mkdir -p ~/.local/share/piper/voices && cd ~/.local/share/piper/voices
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

# 4. Install and start services
export OPENROUTER_API_KEY=your-key-here
bash pipeline/setup.sh
bash pipeline/manage.sh start

# Say "Hey Jarvis" — blue LED = listening

Setup Details

Build whisper.cpp with CUDA

See BUILD.md for full instructions. Critical flag:

cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=72 -DCMAKE_BUILD_TYPE=Release
make -j4   # ~45 min — detach with nohup if needed

⚠️ CMAKE_CUDA_ARCHITECTURES=72 (sm_72 = Xavier NX) is critical. Default multi-arch compilation OOMs on 8GB Jetson.

Architecture map:

  • Xavier NX / AGX Xavier → 72
  • Orin → 87
  • TX2 → 62
  • Nano → 53

Piper Voice Models

mkdir -p ~/.local/share/piper/voices && cd "$_"

# English (required)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

# Greek (optional — any language from huggingface.co/rhasspy/piper-voices works)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx.json

Service Install

setup.sh writes and enables the systemd user services automatically:

bash pipeline/setup.sh [/path/to/voice_pipeline.py] [API_KEY]

Or with env var:

OPENROUTER_API_KEY=sk-... bash pipeline/setup.sh

Re-run to update an existing install.

ReSpeaker Mic Gain & USB Autosuspend

# Optimal gain (no clipping, RMS ~180 ambient)
amixer -c 0 set Mic 90

# Prevent USB autosuspend (mic sleeps after 2s idle without this)
sudo tee /etc/udev/rules.d/99-usb-audio-nosuspend.rules \x3C\x3C 'EOF'
ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="0007", \
  ATTR{power/control}="on", ATTR{power/autosuspend}="-1"
EOF
sudo udevadm control --reload-rules

Management

bash pipeline/manage.sh start     # start both services
bash pipeline/manage.sh stop      # stop both services
bash pipeline/manage.sh restart   # restart both
bash pipeline/manage.sh status    # systemd status
bash pipeline/manage.sh logs      # tail live log
bash pipeline/manage.sh test-mic  # record 4s + play back
bash pipeline/manage.sh test-stt  # record 4s + transcribe
bash pipeline/manage.sh test-tts  # speak a test phrase

Environment Variables

Variable Default Description
OPENROUTER_API_KEY (required) API key for OpenRouter (or any OpenAI-compatible provider)
VOICE_MIC hw:Array,0 ALSA mic device name
VOICE_SPEAKER hw:C2c,0 ALSA speaker device name
VOICE_LLM_URL OpenRouter LLM API endpoint
VOICE_LLM_MODEL anthropic/claude-3.5-haiku Model name
VOICE_WAKE_THRESHOLD 0.5 Wake word confidence (0.0–1.0)
VOICE_SPEECH_RMS 400 Fallback speech RMS threshold
VOICE_SILENCE_RMS 250 Fallback silence RMS threshold
VOICE_UTC_OFFSET 0 Timezone offset hours for LLM context
PIPER_VOICES_DIR ~/.local/share/piper/voices Piper voice models directory
WHISPER_URL http://127.0.0.1:8181/inference whisper-server endpoint
WHISPER_BIN ~/.local/bin/whisper-server-gpu whisper-server binary (used by setup.sh)
WHISPER_MODEL ~/.local/share/whisper/models/ggml-base.bin Whisper model (used by setup.sh)

Troubleshooting

Mic records silence

  • Check gain: amixer -c 0 set Mic 90
  • Use card name not number (hw:Array,0 not hw:0,0) — numbers shift on reboot
  • ReSpeaker requires S24_3LE format, not S16_LE
  • Disable USB autosuspend (see setup above)

Records full 6s timeout, never cuts off

  • Room ambient noise > VOICE_SILENCE_RMS fallback. Dynamic calibration handles this automatically.
  • If still an issue, set VOICE_SILENCE_RMS slightly above your measured ambient floor.

[BEEPING] or (bell dings) in transcript

  • Speaker beep being picked up by mic. The 0.3s drain buffer after beep handles this.
  • Check speaker/mic distance and speaker volume.

Whisper OOM during build

  • Must use -DCMAKE_CUDA_ARCHITECTURES=72 — default multi-arch build exhausts 8GB RAM.
  • Use -j4 not -j6.

LED not lighting up

  • Install pyusb: pip install pyusb
  • Only supported on ReSpeaker USB Mic Array v1.0 (2886:0007)
  • All LED errors are silent — pipeline continues without it.

Wake word triggers constantly (false positives)

  • Lower VOICE_WAKE_THRESHOLD to 0.7 or higher.
  • Ensure no TV/radio playing phrases close to "Hey Jarvis".

File Structure

jetson-cuda-voice/
├── SKILL.md                  ← this file
├── BUILD.md                  ← whisper.cpp CUDA build guide
└── pipeline/
    ├── voice_pipeline.py     ← main pipeline
    ├── led.py                ← ReSpeaker LED control (optional)
    ├── setup.sh              ← one-command service installer
    └── manage.sh             ← start/stop/status/test
Usage Guidance
This skill appears to do what it says (local STT/TTS with a networked LLM). Before installing, consider: (1) Your speech is transcribed locally but the resulting text is sent to whatever LLM endpoint you configure (default openrouter.ai). Only install if you trust that provider or change VOICE_LLM_URL to a local/self-hosted endpoint. (2) setup.sh writes Environment="OPENROUTER_API_KEY=..." into a user systemd unit file (~/.config/systemd/user) — that stores your API key in plain text; consider using a systemd EnvironmentFile with restricted permissions or another secret mechanism instead of embedding the key. (3) The optional udev fix requires sudo (writes /etc/udev/rules.d). (4) Building whisper.cpp on a Jetson is time- and resource-intensive; follow BUILD.md and ensure you have adequate swap/free memory. (5) Inspect the scripts yourself (they're included) before running them. If you want stronger privacy, run a local/air-gapped LLM-compatible server and set VOICE_LLM_URL accordingly or avoid providing an API key.
Capability Analysis
Type: OpenClaw Skill Name: jetson-cuda-voice Version: 1.1.0 The skill is classified as suspicious due to several vulnerabilities and risky operations, though without clear evidence of intentional malice. Key concerns include: 1) Potential for shell injection in `pipeline/manage.sh` and `pipeline/setup.sh` if environment variables or script arguments are manipulated by an attacker (e.g., via prompt injection to the OpenClaw agent). 2) The `OPENROUTER_API_KEY` is stored in plain text within the systemd service file (`~/.config/systemd/user/voice-pipeline.service`), posing an information disclosure risk. 3) The `SKILL.md` and `setup.sh` (as a tip) instruct the user to execute `sudo` commands to modify system-wide udev rules, which is a privileged operation, even if for a stated hardware fix. While the skill performs remote downloads and network calls, these are from legitimate sources (Hugging Face, OpenRouter) and for the stated purpose of the voice assistant.
Capability Assessment
Purpose & Capability
The name/description (Jetson CUDA voice pipeline) match the code and SKILL.md. Required binaries (arecord, aplay, python3) and dependencies (openwakeword, piper-tts, whisper.cpp) are appropriate for the stated functionality. Required env var OPENROUTER_API_KEY is used by the code to call an LLM and is consistent with the stated 'only the LLM uses the internet' claim.
Instruction Scope
Runtime instructions and scripts stick to the stated pipeline. The code captures microphone audio, runs local STT/TTS, and sends transcriptions to the LLM_URL (defaults to openrouter.ai). This is within scope, but it does mean user speech (transcriptions) are transmitted off-device to the configured LLM provider — the SKILL.md does disclose this, but users should be aware of the data flow and privacy implications.
Install Mechanism
No opaque download/install spec in skill registry; build and download steps are explicit in SKILL.md/BUILD.md (git clone github.com/ggerganov/whisper.cpp, wget from huggingface, pip installs). These are standard sources for this workload; no shorteners or personal servers are used. Building whisper.cpp on-device is heavy but expected.
Credentials
Only one required env var (OPENROUTER_API_KEY) is requested and it is justified by the LLM call. However, setup.sh embeds the API key directly into the user systemd unit file (Environment=...), which persists the secret in plain text in ~/.config/systemd/user — a practical security concern to consider (see guidance).
Persistence & Privilege
setup.sh installs and enables user-level systemd services (whisper-server and voice-pipeline) so the pipeline persists for the user session; always:false so it is not force-included. The optional udev rule in instructions requires root to write /etc/udev/rules.d (expected for USB device handling). The service persistence combined with storing the API key in the unit increases the impact of a compromised account or machine.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install jetson-cuda-voice
  3. After installation, invoke the skill by name or use /jetson-cuda-voice
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
v1.1.0: Add setup.sh one-command installer (embeds systemd services inline). Fix manage.sh hardcoded devices — now uses VOICE_MIC/VOICE_SPEAKER env vars. Remove unused json import. Fix fragile test-tts heredoc. Remove cmake from runtime requires. Clean up SKILL.md: quick start section, fixed file structure, removed missing systemd/ dir reference.
v1.0.0
Initial release — offline wake word + whisper.cpp GPU STT + Piper TTS + ReSpeaker LED feedback + dynamic ambient noise calibration. Tested on Jetson Xavier NX sm_72 JetPack 5.1.4.
Metadata
Slug jetson-cuda-voice
Version 1.1.0
License
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Jetson CUDA Voice Pipeline?

Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe... It is an AI Agent Skill for Claude Code / OpenClaw, with 579 downloads so far.

How do I install Jetson CUDA Voice Pipeline?

Run "/install jetson-cuda-voice" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Jetson CUDA Voice Pipeline free?

Yes, Jetson CUDA Voice Pipeline is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Jetson CUDA Voice Pipeline support?

Jetson CUDA Voice Pipeline is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux).

Who created Jetson CUDA Voice Pipeline?

It is built and maintained by Manolis Nikiforakis (@nikil511); the current version is v1.1.0.

💬 Comments