Description

Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe...

README (SKILL.md)

Jetson CUDA Voice Pipeline

Name: Jetson CUDA Voice Pipeline
Author: nikil511

Fully offline, GPU-accelerated local voice assistant for NVIDIA Jetson devices. No cloud for STT or TTS — only the LLM call uses the internet (OpenRouter or any OpenAI-compatible endpoint).

Architecture

ReSpeaker mic (hw:Array,0, S24_3LE, 16kHz)
    ↓ arecord raw stream — never restarted mid-conversation
openWakeWord — "Hey Jarvis" detection (~32ms chunks)
    ↓ wake word triggered → two-tone beep
_measure_ambient() — 480ms median RMS → dynamic VAD thresholds
    ↓
transcribe_stream() — VAD + whisper.cpp CUDA HTTP (~2-4s per utterance)
    ↓
ask_llm() — OpenRouter or local OpenAI-compatible API (~1-2s)
    ↓
Piper TTS — offline neural TTS, hot-loaded at startup → aplay
    ↓
ReSpeaker LEDs: 🔵 blue=listening  🩵 cyan=thinking  ⚫ off=done  🔴 red=error

Total latency: ~5-8 seconds from wake word to first spoken word.

Key Features

Zero mic-restart gap — same arecord pipe feeds wake word detection and STT
Dynamic ambient calibration — measures room noise floor on every wake word trigger (adapts to fans, AC, time of day)
Conversation history — 20-turn rolling context for natural follow-ups
Auto language detection — whisper -l auto, works multilingual
ReSpeaker LED ring — visual state feedback (silent no-op if device not present)
Fully configurable — all paths and thresholds via environment variables

Hardware Requirements

Component	Tested	Notes
Jetson Xavier NX	✅	ARM64, sm_72, 8GB, JetPack 5.1.4
ReSpeaker USB Mic Array v1.0	✅	2886:0007, S24_3LE, 16kHz
Any ALSA speaker	✅	tested with Creative MUVO 2c
Other Jetson models	✅	change `CMAKE_CUDA_ARCHITECTURES`

Quick Start

# 1. Install Python deps
pip install openwakeword piper-tts numpy requests pyusb

# 2. Build whisper.cpp with CUDA (see BUILD.md — ~45 min, one-time)
#    Then place binary at ~/.local/bin/whisper-server-gpu

# 3. Download Piper voice model
mkdir -p ~/.local/share/piper/voices && cd ~/.local/share/piper/voices
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

# 4. Install and start services
export OPENROUTER_API_KEY=your-key-here
bash pipeline/setup.sh
bash pipeline/manage.sh start

# Say "Hey Jarvis" — blue LED = listening

Setup Details

Build whisper.cpp with CUDA

See BUILD.md for full instructions. Critical flag:

cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=72 -DCMAKE_BUILD_TYPE=Release
make -j4   # ~45 min — detach with nohup if needed

⚠️ CMAKE_CUDA_ARCHITECTURES=72 (sm_72 = Xavier NX) is critical. Default multi-arch compilation OOMs on 8GB Jetson.

Architecture map:

Xavier NX / AGX Xavier → 72
Orin → 87
TX2 → 62
Nano → 53

Piper Voice Models

mkdir -p ~/.local/share/piper/voices && cd "$_"

# English (required)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

# Greek (optional — any language from huggingface.co/rhasspy/piper-voices works)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx.json

Service Install

setup.sh writes and enables the systemd user services automatically:

bash pipeline/setup.sh [/path/to/voice_pipeline.py] [API_KEY]

Or with env var:

OPENROUTER_API_KEY=sk-... bash pipeline/setup.sh

Re-run to update an existing install.

ReSpeaker Mic Gain & USB Autosuspend

# Optimal gain (no clipping, RMS ~180 ambient)
amixer -c 0 set Mic 90

# Prevent USB autosuspend (mic sleeps after 2s idle without this)
sudo tee /etc/udev/rules.d/99-usb-audio-nosuspend.rules \x3C\x3C 'EOF'
ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="0007", \
  ATTR{power/control}="on", ATTR{power/autosuspend}="-1"
EOF
sudo udevadm control --reload-rules

Management

bash pipeline/manage.sh start     # start both services
bash pipeline/manage.sh stop      # stop both services
bash pipeline/manage.sh restart   # restart both
bash pipeline/manage.sh status    # systemd status
bash pipeline/manage.sh logs      # tail live log
bash pipeline/manage.sh test-mic  # record 4s + play back
bash pipeline/manage.sh test-stt  # record 4s + transcribe
bash pipeline/manage.sh test-tts  # speak a test phrase

Environment Variables

Variable	Default	Description
`OPENROUTER_API_KEY`	(required)	API key for OpenRouter (or any OpenAI-compatible provider)
`VOICE_MIC`	`hw:Array,0`	ALSA mic device name
`VOICE_SPEAKER`	`hw:C2c,0`	ALSA speaker device name
`VOICE_LLM_URL`	OpenRouter	LLM API endpoint
`VOICE_LLM_MODEL`	`anthropic/claude-3.5-haiku`	Model name
`VOICE_WAKE_THRESHOLD`	`0.5`	Wake word confidence (0.0–1.0)
`VOICE_SPEECH_RMS`	`400`	Fallback speech RMS threshold
`VOICE_SILENCE_RMS`	`250`	Fallback silence RMS threshold
`VOICE_UTC_OFFSET`	`0`	Timezone offset hours for LLM context
`PIPER_VOICES_DIR`	`~/.local/share/piper/voices`	Piper voice models directory
`WHISPER_URL`	`http://127.0.0.1:8181/inference`	whisper-server endpoint
`WHISPER_BIN`	`~/.local/bin/whisper-server-gpu`	whisper-server binary (used by setup.sh)
`WHISPER_MODEL`	`~/.local/share/whisper/models/ggml-base.bin`	Whisper model (used by setup.sh)

Troubleshooting

Mic records silence

Check gain: amixer -c 0 set Mic 90
Use card name not number (hw:Array,0 not hw:0,0) — numbers shift on reboot
ReSpeaker requires S24_3LE format, not S16_LE
Disable USB autosuspend (see setup above)

Records full 6s timeout, never cuts off

Room ambient noise > VOICE_SILENCE_RMS fallback. Dynamic calibration handles this automatically.
If still an issue, set VOICE_SILENCE_RMS slightly above your measured ambient floor.

[BEEPING] or (bell dings) in transcript

Speaker beep being picked up by mic. The 0.3s drain buffer after beep handles this.
Check speaker/mic distance and speaker volume.

Whisper OOM during build

Must use -DCMAKE_CUDA_ARCHITECTURES=72 — default multi-arch build exhausts 8GB RAM.
Use -j4 not -j6.

LED not lighting up

Install pyusb: pip install pyusb
Only supported on ReSpeaker USB Mic Array v1.0 (2886:0007)
All LED errors are silent — pipeline continues without it.

Wake word triggers constantly (false positives)

Lower VOICE_WAKE_THRESHOLD to 0.7 or higher.
Ensure no TV/radio playing phrases close to "Hey Jarvis".

File Structure

jetson-cuda-voice/
├── SKILL.md                  ← this file
├── BUILD.md                  ← whisper.cpp CUDA build guide
└── pipeline/
    ├── voice_pipeline.py     ← main pipeline
    ├── led.py                ← ReSpeaker LED control (optional)
    ├── setup.sh              ← one-command service installer
    └── manage.sh             ← start/stop/status/test

Usage Guidance

This skill appears to do what it says (local STT/TTS with a networked LLM). Before installing, consider: (1) Your speech is transcribed locally but the resulting text is sent to whatever LLM endpoint you configure (default openrouter.ai). Only install if you trust that provider or change VOICE_LLM_URL to a local/self-hosted endpoint. (2) setup.sh writes Environment="OPENROUTER_API_KEY=..." into a user systemd unit file (~/.config/systemd/user) — that stores your API key in plain text; consider using a systemd EnvironmentFile with restricted permissions or another secret mechanism instead of embedding the key. (3) The optional udev fix requires sudo (writes /etc/udev/rules.d). (4) Building whisper.cpp on a Jetson is time- and resource-intensive; follow BUILD.md and ensure you have adequate swap/free memory. (5) Inspect the scripts yourself (they're included) before running them. If you want stronger privacy, run a local/air-gapped LLM-compatible server and set VOICE_LLM_URL accordingly or avoid providing an API key.

Capability Analysis

Type: OpenClaw Skill Name: jetson-cuda-voice Version: 1.1.0 The skill is classified as suspicious due to several vulnerabilities and risky operations, though without clear evidence of intentional malice. Key concerns include: 1) Potential for shell injection in `pipeline/manage.sh` and `pipeline/setup.sh` if environment variables or script arguments are manipulated by an attacker (e.g., via prompt injection to the OpenClaw agent). 2) The `OPENROUTER_API_KEY` is stored in plain text within the systemd service file (`~/.config/systemd/user/voice-pipeline.service`), posing an information disclosure risk. 3) The `SKILL.md` and `setup.sh` (as a tip) instruct the user to execute `sudo` commands to modify system-wide udev rules, which is a privileged operation, even if for a stated hardware fix. While the skill performs remote downloads and network calls, these are from legitimate sources (Hugging Face, OpenRouter) and for the stated purpose of the voice assistant.

Capability Assessment

✓ Purpose & Capability

The name/description (Jetson CUDA voice pipeline) match the code and SKILL.md. Required binaries (arecord, aplay, python3) and dependencies (openwakeword, piper-tts, whisper.cpp) are appropriate for the stated functionality. Required env var OPENROUTER_API_KEY is used by the code to call an LLM and is consistent with the stated 'only the LLM uses the internet' claim.

ℹ Instruction Scope

Runtime instructions and scripts stick to the stated pipeline. The code captures microphone audio, runs local STT/TTS, and sends transcriptions to the LLM_URL (defaults to openrouter.ai). This is within scope, but it does mean user speech (transcriptions) are transmitted off-device to the configured LLM provider — the SKILL.md does disclose this, but users should be aware of the data flow and privacy implications.

✓ Install Mechanism

No opaque download/install spec in skill registry; build and download steps are explicit in SKILL.md/BUILD.md (git clone github.com/ggerganov/whisper.cpp, wget from huggingface, pip installs). These are standard sources for this workload; no shorteners or personal servers are used. Building whisper.cpp on-device is heavy but expected.

ℹ Credentials

Only one required env var (OPENROUTER_API_KEY) is requested and it is justified by the LLM call. However, setup.sh embeds the API key directly into the user systemd unit file (Environment=...), which persists the secret in plain text in ~/.config/systemd/user — a practical security concern to consider (see guidance).

ℹ Persistence & Privilege

setup.sh installs and enables user-level systemd services (whisper-server and voice-pipeline) so the pipeline persists for the user session; always:false so it is not force-included. The optional udev rule in instructions requires root to write /etc/udev/rules.d (expected for USB device handling). The service persistence combined with storing the API key in the unit increases the impact of a compromised account or machine.

Version History

v1.1.0

v1.1.0: Add setup.sh one-command installer (embeds systemd services inline). Fix manage.sh hardcoded devices — now uses VOICE_MIC/VOICE_SPEAKER env vars. Remove unused json import. Fix fragile test-tts heredoc. Remove cmake from runtime requires. Clean up SKILL.md: quick start section, fixed file structure, removed missing systemd/ dir reference.

v1.0.0

Initial release — offline wake word + whisper.cpp GPU STT + Piper TTS + ReSpeaker LED feedback + dynamic ambient noise calibration. Tested on Jetson Xavier NX sm_72 JetPack 5.1.4.

Metadata

Slug jetson-cuda-voice

Version 1.1.0

License —

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Jetson CUDA Voice Pipeline?

Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe... It is an AI Agent Skill for Claude Code / OpenClaw, with 579 downloads so far.

How do I install Jetson CUDA Voice Pipeline?

Run "/install jetson-cuda-voice" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Jetson CUDA Voice Pipeline free?

Yes, Jetson CUDA Voice Pipeline is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Jetson CUDA Voice Pipeline support?

Jetson CUDA Voice Pipeline is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux).

Who created Jetson CUDA Voice Pipeline?

It is built and maintained by Manolis Nikiforakis (@nikil511); the current version is v1.1.0.

More Skills

Jetson CUDA Voice Pipeline

Jetson CUDA Voice Pipeline

Architecture

Key Features

Hardware Requirements

Quick Start

Setup Details

Build whisper.cpp with CUDA

Piper Voice Models

Service Install

ReSpeaker Mic Gain & USB Autosuspend

Management

Environment Variables

Troubleshooting

File Structure

What is Jetson CUDA Voice Pipeline?

How do I install Jetson CUDA Voice Pipeline?

Is Jetson CUDA Voice Pipeline free?

Which platforms does Jetson CUDA Voice Pipeline support?

Who created Jetson CUDA Voice Pipeline?

💬 Comments