← Back to Skills Marketplace
ai-gaoqian

ai-audio-processingAI Audio Processing Studio

by ai-gaoqian · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
47
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install ai-audio-processing
Description
AI驱动的全栈音频处理技能。覆盖语音转文字(多语言ASR)、文字转语音(TTS含情感控制)、音频降噪与修复、音乐信息检索(MIR)、自动混音与母带处理、播客制作流水线、实时翻译配音。支持Whisper/Bark/OpenVoice/Demucs等前沿模型,兼容DAW工作流(Ableton/Logic/Reaper)。
README (SKILL.md)

AI Audio Processing Studio

AI-powered full-stack audio processing skill. Covers ASR, TTS, noise reduction, music analysis, auto-mixing, podcast production, and real-time dubbing.

Core Modules

1. Speech-to-Text (ASR)

  • Multi-language transcription (100+ languages via Whisper)
  • Speaker diarization (identify who spoke when)
  • Timestamp-aligned subtitles (SRT/VTT/ASS)
  • Real-time streaming transcription
  • Domain-specific vocabulary customization (medical/legal/tech)
  • Punctuation and capitalization restoration

2. Text-to-Speech (TTS)

  • Natural voice synthesis (Bark/OpenVoice/CosyVoice)
  • Emotion control (happy, sad, angry, neutral, enthusiastic)
  • Voice cloning from 10-second sample
  • Multi-speaker dialog generation
  • Speed and pitch adjustment
  • Audiobook narration pipeline (chapter-aware)

3. Audio Restoration & Enhancement

  • Noise reduction (stationary + non-stationary)
  • De-click, de-clip, de-ess processing
  • Reverb removal and room acoustics correction
  • Audio upscaling (8kHz→48kHz via super-resolution)
  • Old recording restoration (vinyl crackle, tape hiss)
  • Voice isolation from background music

4. Music Information Retrieval (MIR)

  • Beat/tempo detection and BPM analysis
  • Key and chord recognition
  • Instrument separation (vocals/drums/bass/other via Demucs)
  • Music structure analysis (verse/chorus/bridge detection)
  • Genre classification and mood tagging
  • Melody extraction and MIDI transcription

5. Auto-Mixing & Mastering

  • Automatic level balancing (LUFS normalization)
  • EQ matching to reference tracks
  • Dynamic compression optimization
  • Stereo width enhancement
  • Loudness compliance (Broadcast/Streaming: -14 LUFS, -23 LUFS, -16 LUFS)
  • Multi-format export (WAV/FLAC/MP3/AAC/OGG)

6. Podcast Production Pipeline

Record → Transcribe → Edit by text → Mix & Master → Export
  • Text-based audio editing (cut by deleting transcript)
  • Intro/outro templating with dynamic content
  • Ad-insertion point detection
  • Show notes and chapter marker generation
  • RSS feed generation for publishing

7. Real-time Translation Dubbing

  • Speech→Translate→TTS pipeline
  • Lip-sync timing adjustment
  • Multi-track dubbing for multilingual content
  • Voice preservation across translations (voice cloning)
  • Subtitle burn-in with styling

Supported Audio Formats

  • Input: WAV, MP3, FLAC, AAC, OGG, M4A, WMA, AIFF, OPUS
  • Output: WAV (24-bit/48kHz), FLAC, MP3 (320kbps), AAC, OGG

Usage Examples

# Transcribe meeting recording
action: transcribe
input: meeting_2026-06-13.wav
language: zh
speakers: 4
output: meeting_transcript.srt
diarization: true

# Podcast production
action: podcast_pipeline
input: raw_interview.wav
host_voice: host_profile.json
guest_voice: guest_sample.wav
intro_music: intro.mp3
output: episode_042_final.mp3
chapters: auto
show_notes: true
Usage Guidance
Installers should treat this as a broad audio-production assistant. Use it only on recordings and voice samples you are authorized to process, review transcripts and generated audio before sharing, and confirm any RSS or publishing output manually.
Capability Tags
crypto
Capability Assessment
Purpose & Capability
The skill advertises ASR, diarization, voice synthesis/cloning, audio restoration, podcast production, and RSS generation; these are coherent with an AI audio studio, though several involve personal or biometric audio data.
Instruction Scope
The runtime instructions are examples for user-directed audio workflows and do not include hidden commands, prompt overrides, automatic execution, or unrelated behavior.
Install Mechanism
The artifact is a single SKILL.md file with dependency requirements for Python, ffmpeg, and torch; no installer scripts or executable payloads are present.
Credentials
Audio libraries and local model tooling are proportionate to the advertised audio-processing purpose. No broad filesystem indexing, credential access, or network exfiltration is described.
Persistence & Privilege
No persistence mechanism, privilege escalation, background worker, account access, or credential/session handling appears in the artifacts.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ai-audio-processing
  3. After installation, invoke the skill by name or use /ai-audio-processing
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
AI Audio Processing Studio v1.0.0 – Initial Release - Introduces a comprehensive AI-powered audio processing toolkit covering ASR, TTS (with emotion control), audio restoration/enhancement, music information retrieval, auto-mixing/mastering, podcast production, and real-time translation dubbing. - Supports advanced features like multi-language transcription, speaker diarization, voice cloning, audio upscaling, instrument separation, and genre classification. - Compatible with leading models (Whisper, Bark, OpenVoice, Demucs) and DAWs such as Ableton, Logic, and Reaper. - Enables streamlined podcast workflow and text-based audio editing. - Offers broad audio format compatibility for both input and output. - Requires Python ≥3.10, ffmpeg, and torch ≥2.0.
Metadata
Slug ai-audio-processing
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is ai-audio-processingAI Audio Processing Studio?

AI驱动的全栈音频处理技能。覆盖语音转文字(多语言ASR)、文字转语音(TTS含情感控制)、音频降噪与修复、音乐信息检索(MIR)、自动混音与母带处理、播客制作流水线、实时翻译配音。支持Whisper/Bark/OpenVoice/Demucs等前沿模型,兼容DAW工作流(Ableton/Logic/Reaper)。 It is an AI Agent Skill for Claude Code / OpenClaw, with 47 downloads so far.

How do I install ai-audio-processingAI Audio Processing Studio?

Run "/install ai-audio-processing" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ai-audio-processingAI Audio Processing Studio free?

Yes, ai-audio-processingAI Audio Processing Studio is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ai-audio-processingAI Audio Processing Studio support?

ai-audio-processingAI Audio Processing Studio is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ai-audio-processingAI Audio Processing Studio?

It is built and maintained by ai-gaoqian (@ai-gaoqian); the current version is v1.0.0.

💬 Comments