← 返回 Skills 市场
nikil511

Jetson CUDA Voice Pipeline

作者 Manolis Nikiforakis · GitHub ↗ · v1.1.0
linux ⚠ suspicious
579
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install jetson-cuda-voice
功能描述
Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe...
使用说明 (SKILL.md)

Jetson CUDA Voice Pipeline

Fully offline, GPU-accelerated local voice assistant for NVIDIA Jetson devices. No cloud for STT or TTS — only the LLM call uses the internet (OpenRouter or any OpenAI-compatible endpoint).

Architecture

ReSpeaker mic (hw:Array,0, S24_3LE, 16kHz)
    ↓ arecord raw stream — never restarted mid-conversation
openWakeWord — "Hey Jarvis" detection (~32ms chunks)
    ↓ wake word triggered → two-tone beep
_measure_ambient() — 480ms median RMS → dynamic VAD thresholds
    ↓
transcribe_stream() — VAD + whisper.cpp CUDA HTTP (~2-4s per utterance)
    ↓
ask_llm() — OpenRouter or local OpenAI-compatible API (~1-2s)
    ↓
Piper TTS — offline neural TTS, hot-loaded at startup → aplay
    ↓
ReSpeaker LEDs: 🔵 blue=listening  🩵 cyan=thinking  ⚫ off=done  🔴 red=error

Total latency: ~5-8 seconds from wake word to first spoken word.

Key Features

  • Zero mic-restart gap — same arecord pipe feeds wake word detection and STT
  • Dynamic ambient calibration — measures room noise floor on every wake word trigger (adapts to fans, AC, time of day)
  • Conversation history — 20-turn rolling context for natural follow-ups
  • Auto language detection — whisper -l auto, works multilingual
  • ReSpeaker LED ring — visual state feedback (silent no-op if device not present)
  • Fully configurable — all paths and thresholds via environment variables

Hardware Requirements

Component Tested Notes
Jetson Xavier NX ARM64, sm_72, 8GB, JetPack 5.1.4
ReSpeaker USB Mic Array v1.0 2886:0007, S24_3LE, 16kHz
Any ALSA speaker tested with Creative MUVO 2c
Other Jetson models change CMAKE_CUDA_ARCHITECTURES

Quick Start

# 1. Install Python deps
pip install openwakeword piper-tts numpy requests pyusb

# 2. Build whisper.cpp with CUDA (see BUILD.md — ~45 min, one-time)
#    Then place binary at ~/.local/bin/whisper-server-gpu

# 3. Download Piper voice model
mkdir -p ~/.local/share/piper/voices && cd ~/.local/share/piper/voices
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

# 4. Install and start services
export OPENROUTER_API_KEY=your-key-here
bash pipeline/setup.sh
bash pipeline/manage.sh start

# Say "Hey Jarvis" — blue LED = listening

Setup Details

Build whisper.cpp with CUDA

See BUILD.md for full instructions. Critical flag:

cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=72 -DCMAKE_BUILD_TYPE=Release
make -j4   # ~45 min — detach with nohup if needed

⚠️ CMAKE_CUDA_ARCHITECTURES=72 (sm_72 = Xavier NX) is critical. Default multi-arch compilation OOMs on 8GB Jetson.

Architecture map:

  • Xavier NX / AGX Xavier → 72
  • Orin → 87
  • TX2 → 62
  • Nano → 53

Piper Voice Models

mkdir -p ~/.local/share/piper/voices && cd "$_"

# English (required)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

# Greek (optional — any language from huggingface.co/rhasspy/piper-voices works)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/el/el_GR/rapunzelina/medium/el_GR-rapunzelina-medium.onnx.json

Service Install

setup.sh writes and enables the systemd user services automatically:

bash pipeline/setup.sh [/path/to/voice_pipeline.py] [API_KEY]

Or with env var:

OPENROUTER_API_KEY=sk-... bash pipeline/setup.sh

Re-run to update an existing install.

ReSpeaker Mic Gain & USB Autosuspend

# Optimal gain (no clipping, RMS ~180 ambient)
amixer -c 0 set Mic 90

# Prevent USB autosuspend (mic sleeps after 2s idle without this)
sudo tee /etc/udev/rules.d/99-usb-audio-nosuspend.rules \x3C\x3C 'EOF'
ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="0007", \
  ATTR{power/control}="on", ATTR{power/autosuspend}="-1"
EOF
sudo udevadm control --reload-rules

Management

bash pipeline/manage.sh start     # start both services
bash pipeline/manage.sh stop      # stop both services
bash pipeline/manage.sh restart   # restart both
bash pipeline/manage.sh status    # systemd status
bash pipeline/manage.sh logs      # tail live log
bash pipeline/manage.sh test-mic  # record 4s + play back
bash pipeline/manage.sh test-stt  # record 4s + transcribe
bash pipeline/manage.sh test-tts  # speak a test phrase

Environment Variables

Variable Default Description
OPENROUTER_API_KEY (required) API key for OpenRouter (or any OpenAI-compatible provider)
VOICE_MIC hw:Array,0 ALSA mic device name
VOICE_SPEAKER hw:C2c,0 ALSA speaker device name
VOICE_LLM_URL OpenRouter LLM API endpoint
VOICE_LLM_MODEL anthropic/claude-3.5-haiku Model name
VOICE_WAKE_THRESHOLD 0.5 Wake word confidence (0.0–1.0)
VOICE_SPEECH_RMS 400 Fallback speech RMS threshold
VOICE_SILENCE_RMS 250 Fallback silence RMS threshold
VOICE_UTC_OFFSET 0 Timezone offset hours for LLM context
PIPER_VOICES_DIR ~/.local/share/piper/voices Piper voice models directory
WHISPER_URL http://127.0.0.1:8181/inference whisper-server endpoint
WHISPER_BIN ~/.local/bin/whisper-server-gpu whisper-server binary (used by setup.sh)
WHISPER_MODEL ~/.local/share/whisper/models/ggml-base.bin Whisper model (used by setup.sh)

Troubleshooting

Mic records silence

  • Check gain: amixer -c 0 set Mic 90
  • Use card name not number (hw:Array,0 not hw:0,0) — numbers shift on reboot
  • ReSpeaker requires S24_3LE format, not S16_LE
  • Disable USB autosuspend (see setup above)

Records full 6s timeout, never cuts off

  • Room ambient noise > VOICE_SILENCE_RMS fallback. Dynamic calibration handles this automatically.
  • If still an issue, set VOICE_SILENCE_RMS slightly above your measured ambient floor.

[BEEPING] or (bell dings) in transcript

  • Speaker beep being picked up by mic. The 0.3s drain buffer after beep handles this.
  • Check speaker/mic distance and speaker volume.

Whisper OOM during build

  • Must use -DCMAKE_CUDA_ARCHITECTURES=72 — default multi-arch build exhausts 8GB RAM.
  • Use -j4 not -j6.

LED not lighting up

  • Install pyusb: pip install pyusb
  • Only supported on ReSpeaker USB Mic Array v1.0 (2886:0007)
  • All LED errors are silent — pipeline continues without it.

Wake word triggers constantly (false positives)

  • Lower VOICE_WAKE_THRESHOLD to 0.7 or higher.
  • Ensure no TV/radio playing phrases close to "Hey Jarvis".

File Structure

jetson-cuda-voice/
├── SKILL.md                  ← this file
├── BUILD.md                  ← whisper.cpp CUDA build guide
└── pipeline/
    ├── voice_pipeline.py     ← main pipeline
    ├── led.py                ← ReSpeaker LED control (optional)
    ├── setup.sh              ← one-command service installer
    └── manage.sh             ← start/stop/status/test
安全使用建议
This skill appears to do what it says (local STT/TTS with a networked LLM). Before installing, consider: (1) Your speech is transcribed locally but the resulting text is sent to whatever LLM endpoint you configure (default openrouter.ai). Only install if you trust that provider or change VOICE_LLM_URL to a local/self-hosted endpoint. (2) setup.sh writes Environment="OPENROUTER_API_KEY=..." into a user systemd unit file (~/.config/systemd/user) — that stores your API key in plain text; consider using a systemd EnvironmentFile with restricted permissions or another secret mechanism instead of embedding the key. (3) The optional udev fix requires sudo (writes /etc/udev/rules.d). (4) Building whisper.cpp on a Jetson is time- and resource-intensive; follow BUILD.md and ensure you have adequate swap/free memory. (5) Inspect the scripts yourself (they're included) before running them. If you want stronger privacy, run a local/air-gapped LLM-compatible server and set VOICE_LLM_URL accordingly or avoid providing an API key.
功能分析
Type: OpenClaw Skill Name: jetson-cuda-voice Version: 1.1.0 The skill is classified as suspicious due to several vulnerabilities and risky operations, though without clear evidence of intentional malice. Key concerns include: 1) Potential for shell injection in `pipeline/manage.sh` and `pipeline/setup.sh` if environment variables or script arguments are manipulated by an attacker (e.g., via prompt injection to the OpenClaw agent). 2) The `OPENROUTER_API_KEY` is stored in plain text within the systemd service file (`~/.config/systemd/user/voice-pipeline.service`), posing an information disclosure risk. 3) The `SKILL.md` and `setup.sh` (as a tip) instruct the user to execute `sudo` commands to modify system-wide udev rules, which is a privileged operation, even if for a stated hardware fix. While the skill performs remote downloads and network calls, these are from legitimate sources (Hugging Face, OpenRouter) and for the stated purpose of the voice assistant.
能力评估
Purpose & Capability
The name/description (Jetson CUDA voice pipeline) match the code and SKILL.md. Required binaries (arecord, aplay, python3) and dependencies (openwakeword, piper-tts, whisper.cpp) are appropriate for the stated functionality. Required env var OPENROUTER_API_KEY is used by the code to call an LLM and is consistent with the stated 'only the LLM uses the internet' claim.
Instruction Scope
Runtime instructions and scripts stick to the stated pipeline. The code captures microphone audio, runs local STT/TTS, and sends transcriptions to the LLM_URL (defaults to openrouter.ai). This is within scope, but it does mean user speech (transcriptions) are transmitted off-device to the configured LLM provider — the SKILL.md does disclose this, but users should be aware of the data flow and privacy implications.
Install Mechanism
No opaque download/install spec in skill registry; build and download steps are explicit in SKILL.md/BUILD.md (git clone github.com/ggerganov/whisper.cpp, wget from huggingface, pip installs). These are standard sources for this workload; no shorteners or personal servers are used. Building whisper.cpp on-device is heavy but expected.
Credentials
Only one required env var (OPENROUTER_API_KEY) is requested and it is justified by the LLM call. However, setup.sh embeds the API key directly into the user systemd unit file (Environment=...), which persists the secret in plain text in ~/.config/systemd/user — a practical security concern to consider (see guidance).
Persistence & Privilege
setup.sh installs and enables user-level systemd services (whisper-server and voice-pipeline) so the pipeline persists for the user session; always:false so it is not force-included. The optional udev rule in instructions requires root to write /etc/udev/rules.d (expected for USB device handling). The service persistence combined with storing the API key in the unit increases the impact of a compromised account or machine.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install jetson-cuda-voice
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /jetson-cuda-voice 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
v1.1.0: Add setup.sh one-command installer (embeds systemd services inline). Fix manage.sh hardcoded devices — now uses VOICE_MIC/VOICE_SPEAKER env vars. Remove unused json import. Fix fragile test-tts heredoc. Remove cmake from runtime requires. Clean up SKILL.md: quick start section, fixed file structure, removed missing systemd/ dir reference.
v1.0.0
Initial release — offline wake word + whisper.cpp GPU STT + Piper TTS + ReSpeaker LED feedback + dynamic ambient noise calibration. Tested on Jetson Xavier NX sm_72 JetPack 5.1.4.
元数据
Slug jetson-cuda-voice
版本 1.1.0
许可证
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Jetson CUDA Voice Pipeline 是什么?

Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 579 次。

如何安装 Jetson CUDA Voice Pipeline?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install jetson-cuda-voice」即可一键安装,无需额外配置。

Jetson CUDA Voice Pipeline 是免费的吗?

是的,Jetson CUDA Voice Pipeline 完全免费(开源免费),可自由下载、安装和使用。

Jetson CUDA Voice Pipeline 支持哪些平台?

Jetson CUDA Voice Pipeline 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(linux)。

谁开发了 Jetson CUDA Voice Pipeline?

由 Manolis Nikiforakis(@nikil511)开发并维护,当前版本 v1.1.0。

💬 留言讨论