ai-audio-processingAI Audio Processing Studio
/install ai-audio-processing
AI Audio Processing Studio
AI-powered full-stack audio processing skill. Covers ASR, TTS, noise reduction, music analysis, auto-mixing, podcast production, and real-time dubbing.
Core Modules
1. Speech-to-Text (ASR)
- Multi-language transcription (100+ languages via Whisper)
- Speaker diarization (identify who spoke when)
- Timestamp-aligned subtitles (SRT/VTT/ASS)
- Real-time streaming transcription
- Domain-specific vocabulary customization (medical/legal/tech)
- Punctuation and capitalization restoration
2. Text-to-Speech (TTS)
- Natural voice synthesis (Bark/OpenVoice/CosyVoice)
- Emotion control (happy, sad, angry, neutral, enthusiastic)
- Voice cloning from 10-second sample
- Multi-speaker dialog generation
- Speed and pitch adjustment
- Audiobook narration pipeline (chapter-aware)
3. Audio Restoration & Enhancement
- Noise reduction (stationary + non-stationary)
- De-click, de-clip, de-ess processing
- Reverb removal and room acoustics correction
- Audio upscaling (8kHz→48kHz via super-resolution)
- Old recording restoration (vinyl crackle, tape hiss)
- Voice isolation from background music
4. Music Information Retrieval (MIR)
- Beat/tempo detection and BPM analysis
- Key and chord recognition
- Instrument separation (vocals/drums/bass/other via Demucs)
- Music structure analysis (verse/chorus/bridge detection)
- Genre classification and mood tagging
- Melody extraction and MIDI transcription
5. Auto-Mixing & Mastering
- Automatic level balancing (LUFS normalization)
- EQ matching to reference tracks
- Dynamic compression optimization
- Stereo width enhancement
- Loudness compliance (Broadcast/Streaming: -14 LUFS, -23 LUFS, -16 LUFS)
- Multi-format export (WAV/FLAC/MP3/AAC/OGG)
6. Podcast Production Pipeline
Record → Transcribe → Edit by text → Mix & Master → Export
- Text-based audio editing (cut by deleting transcript)
- Intro/outro templating with dynamic content
- Ad-insertion point detection
- Show notes and chapter marker generation
- RSS feed generation for publishing
7. Real-time Translation Dubbing
- Speech→Translate→TTS pipeline
- Lip-sync timing adjustment
- Multi-track dubbing for multilingual content
- Voice preservation across translations (voice cloning)
- Subtitle burn-in with styling
Supported Audio Formats
- Input: WAV, MP3, FLAC, AAC, OGG, M4A, WMA, AIFF, OPUS
- Output: WAV (24-bit/48kHz), FLAC, MP3 (320kbps), AAC, OGG
Usage Examples
# Transcribe meeting recording
action: transcribe
input: meeting_2026-06-13.wav
language: zh
speakers: 4
output: meeting_transcript.srt
diarization: true
# Podcast production
action: podcast_pipeline
input: raw_interview.wav
host_voice: host_profile.json
guest_voice: guest_sample.wav
intro_music: intro.mp3
output: episode_042_final.mp3
chapters: auto
show_notes: true
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install ai-audio-processing - After installation, invoke the skill by name or use
/ai-audio-processing - Provide required inputs per the skill's parameter spec and get structured output
What is ai-audio-processingAI Audio Processing Studio?
AI驱动的全栈音频处理技能。覆盖语音转文字(多语言ASR)、文字转语音(TTS含情感控制)、音频降噪与修复、音乐信息检索(MIR)、自动混音与母带处理、播客制作流水线、实时翻译配音。支持Whisper/Bark/OpenVoice/Demucs等前沿模型,兼容DAW工作流(Ableton/Logic/Reaper)。 It is an AI Agent Skill for Claude Code / OpenClaw, with 47 downloads so far.
How do I install ai-audio-processingAI Audio Processing Studio?
Run "/install ai-audio-processing" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is ai-audio-processingAI Audio Processing Studio free?
Yes, ai-audio-processingAI Audio Processing Studio is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does ai-audio-processingAI Audio Processing Studio support?
ai-audio-processingAI Audio Processing Studio is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created ai-audio-processingAI Audio Processing Studio?
It is built and maintained by ai-gaoqian (@ai-gaoqian); the current version is v1.0.0.