NEXUS Voice Transcriber
/install nexus-voice-transcriber
Setup
On first use, read references/whisper-models.md and references/troubleshooting.md.
Ensure dependencies: ffmpeg, python3, and required Python packages (openai-whisper, deepgram-sdk optional).
When to Use
- User sends a voice note / audio file / video file that needs transcription.
- Need to archive both the original audio and the text transcript.
- Want speaker detection (if using Deepgram with diarization).
- Quick local transcription without external APIs (Whisper).
Architecture
Memory lives in ~/voice-transcriber/. See below for structure.
~/voice-transcriber/
├── memory.md # Provider preferences, defaults, history
├── transcripts/ # Saved transcripts (txt, json, srt)
├── audio/ # Saved original audio files
└── temp/ # Processing workspace (auto-cleaned)
Quick Reference
| Topic | File |
|---|---|
| Whisper model guide | references/whisper-models.md |
| Troubleshooting | references/troubleshooting.md |
| Main script | scripts/transcribe.py |
Core Rules
1. Detect Input Type
Before transcription:
- Local file path → verify exists, check format (mp3, wav, m4a, mp4, etc.)
- URL → download to
temp/, then process - Voice memo → usually single speaker, short
- Meeting / interview → likely multiple speakers, consider diarization
2. Choose Provider Based on Context
| Scenario | Best Provider | Why |
|---|---|---|
| Privacy, no API keys | Local Whisper | Runs on-device, free |
| High accuracy, speed | Deepgram Nova‑3 | Low latency, good accuracy |
| Speaker identification | Deepgram (with diarization) | Native speaker labels |
| No internet | Local Whisper | Offline capable |
3. Handle Long Audio
Files >25 MB or >2 hours:
- Split into chunks with
ffmpeg(seescripts/transcribe.py --split) - Process each chunk
- Merge transcripts with proper timestamps
4. Save Artifacts
After successful transcription:
- Save transcript to
~/voice-transcriber/transcripts/with a meaningful name - Save original audio to
~/voice-transcriber/audio/if user wants archival - Update
memory.mdwith date, file, provider, duration
5. Output Formats
Default to plain text (.txt). Offer alternatives:
.txt— clean text, no timestamps.srt/.vtt— subtitles with timing.json— structured with word‑level timing (Deepgram) or segment timing (Whisper)
Common Traps
- Assuming one provider fits all → Whisper lacks diarization; Deepgram needs API key.
- Uploading huge files directly → Timeouts. Split first.
- Ignoring audio quality → Noisy audio may need preprocessing (
ffmpegnoise reduction). - Not checking language → Whisper auto‑detects but can fail on mixed‑language content.
- Forgetting to save audio → User may want the original file archived.
Requirements
Required:
ffmpeg(audio conversion, splitting)python3+pip- Python packages:
openai-whisper(local),requests(for Deepgram if used)
Optional API keys (only if using Deepgram):
DEEPGRAM_API_KEY— for Deepgram Nova‑3 (speaker diarization available)
Local Whisper works without any API keys.
Provider Quick Reference
Local Whisper (No API Key)
# Install
pip install openai-whisper
# Basic transcription (via script)
python3 scripts/transcribe.py --file audio.wav --provider whisper --model base
# Output formats: txt (default), srt, vtt, json
python3 scripts/transcribe.py --file audio.wav --provider whisper --model medium --format srt
Models: tiny (fastest) → base → small → medium → large (most accurate).
Deepgram Nova‑3 (API Key Required)
# Set environment variable
export DEEPGRAM_API_KEY="your_key_here"
# Transcribe with speaker diarization
python3 scripts/transcribe.py --file audio.wav --provider deepgram --diarize
# Output JSON with speaker labels
python3 scripts/transcribe.py --file audio.wav --provider deepgram --format json
Audio Preprocessing
Extract Audio from Video
ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav
Reduce Noise
ffmpeg -i noisy.wav -af "afftdn=nf=-25" clean.wav
Split Long Audio (10‑minute chunks)
ffmpeg -i long.mp3 -f segment -segment_time 600 -c copy temp/chunk_%03d.mp3
Security & Privacy
Data that stays local:
- Transcripts in
~/voice-transcriber/transcripts/ - Original audio in
~/voice-transcriber/audio/ - Local Whisper processes entirely on‑device
Data that leaves your machine (if using Deepgram):
- Audio file sent to Deepgram API (
api.deepgram.com) - Transcript returned and stored locally
This skill does NOT:
- Store API keys in plain text (use environment variables)
- Auto‑upload without confirmation
- Retain files on external servers after processing
External Endpoints
| Endpoint | Data Sent | Purpose |
|---|---|---|
api.deepgram.com/v1/listen |
Audio file | Deepgram transcription |
Only called when user explicitly chooses Deepgram provider. Local Whisper sends nothing.
Memory Template
Create ~/voice-transcriber/memory.md with this structure:
# Voice Transcriber Memory
## Status
status: ongoing
version: 1.0.0
last: YYYY‑MM‑DD
integration: pending
## Context
\x3C!-- Observations about transcription needs, preferred providers, languages, etc. -->
## Notes
\x3C!-- Provider preferences, format preferences, diarization needs -->
---
*Updated: YYYY‑MM‑DD*
Related Skills
Install with clawhub install \x3Cslug> if user confirms:
speech-to-text-transcription— broader audio/video transcription with more providersffmpeg— advanced audio/video processingaudio— general audio manipulation
Feedback
- If useful:
clawhub star voice-transcriber - Stay updated:
clawhub sync
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install nexus-voice-transcriber - 安装完成后,直接呼叫该 Skill 的名称或使用
/nexus-voice-transcriber触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
NEXUS Voice Transcriber 是什么?
Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3 or local Whisper. Transcribes audio messages, saves both audio files an... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 43 次。
如何安装 NEXUS Voice Transcriber?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install nexus-voice-transcriber」即可一键安装,无需额外配置。
NEXUS Voice Transcriber 是免费的吗?
是的,NEXUS Voice Transcriber 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
NEXUS Voice Transcriber 支持哪些平台?
NEXUS Voice Transcriber 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(linux, darwin, win32)。
谁开发了 NEXUS Voice Transcriber?
由 Matthew00ITA(@matthew00ita)开发并维护,当前版本 v1.0.0。