← 返回 Skills 市场
ai-audio-processingAI Audio Processing Studio
作者
ai-gaoqian
· GitHub ↗
· v1.0.0
· MIT-0
47
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install ai-audio-processing
功能描述
AI驱动的全栈音频处理技能。覆盖语音转文字(多语言ASR)、文字转语音(TTS含情感控制)、音频降噪与修复、音乐信息检索(MIR)、自动混音与母带处理、播客制作流水线、实时翻译配音。支持Whisper/Bark/OpenVoice/Demucs等前沿模型,兼容DAW工作流(Ableton/Logic/Reaper)。
使用说明 (SKILL.md)
AI Audio Processing Studio
AI-powered full-stack audio processing skill. Covers ASR, TTS, noise reduction, music analysis, auto-mixing, podcast production, and real-time dubbing.
Core Modules
1. Speech-to-Text (ASR)
- Multi-language transcription (100+ languages via Whisper)
- Speaker diarization (identify who spoke when)
- Timestamp-aligned subtitles (SRT/VTT/ASS)
- Real-time streaming transcription
- Domain-specific vocabulary customization (medical/legal/tech)
- Punctuation and capitalization restoration
2. Text-to-Speech (TTS)
- Natural voice synthesis (Bark/OpenVoice/CosyVoice)
- Emotion control (happy, sad, angry, neutral, enthusiastic)
- Voice cloning from 10-second sample
- Multi-speaker dialog generation
- Speed and pitch adjustment
- Audiobook narration pipeline (chapter-aware)
3. Audio Restoration & Enhancement
- Noise reduction (stationary + non-stationary)
- De-click, de-clip, de-ess processing
- Reverb removal and room acoustics correction
- Audio upscaling (8kHz→48kHz via super-resolution)
- Old recording restoration (vinyl crackle, tape hiss)
- Voice isolation from background music
4. Music Information Retrieval (MIR)
- Beat/tempo detection and BPM analysis
- Key and chord recognition
- Instrument separation (vocals/drums/bass/other via Demucs)
- Music structure analysis (verse/chorus/bridge detection)
- Genre classification and mood tagging
- Melody extraction and MIDI transcription
5. Auto-Mixing & Mastering
- Automatic level balancing (LUFS normalization)
- EQ matching to reference tracks
- Dynamic compression optimization
- Stereo width enhancement
- Loudness compliance (Broadcast/Streaming: -14 LUFS, -23 LUFS, -16 LUFS)
- Multi-format export (WAV/FLAC/MP3/AAC/OGG)
6. Podcast Production Pipeline
Record → Transcribe → Edit by text → Mix & Master → Export
- Text-based audio editing (cut by deleting transcript)
- Intro/outro templating with dynamic content
- Ad-insertion point detection
- Show notes and chapter marker generation
- RSS feed generation for publishing
7. Real-time Translation Dubbing
- Speech→Translate→TTS pipeline
- Lip-sync timing adjustment
- Multi-track dubbing for multilingual content
- Voice preservation across translations (voice cloning)
- Subtitle burn-in with styling
Supported Audio Formats
- Input: WAV, MP3, FLAC, AAC, OGG, M4A, WMA, AIFF, OPUS
- Output: WAV (24-bit/48kHz), FLAC, MP3 (320kbps), AAC, OGG
Usage Examples
# Transcribe meeting recording
action: transcribe
input: meeting_2026-06-13.wav
language: zh
speakers: 4
output: meeting_transcript.srt
diarization: true
# Podcast production
action: podcast_pipeline
input: raw_interview.wav
host_voice: host_profile.json
guest_voice: guest_sample.wav
intro_music: intro.mp3
output: episode_042_final.mp3
chapters: auto
show_notes: true
安全使用建议
Installers should treat this as a broad audio-production assistant. Use it only on recordings and voice samples you are authorized to process, review transcripts and generated audio before sharing, and confirm any RSS or publishing output manually.
能力标签
能力评估
Purpose & Capability
The skill advertises ASR, diarization, voice synthesis/cloning, audio restoration, podcast production, and RSS generation; these are coherent with an AI audio studio, though several involve personal or biometric audio data.
Instruction Scope
The runtime instructions are examples for user-directed audio workflows and do not include hidden commands, prompt overrides, automatic execution, or unrelated behavior.
Install Mechanism
The artifact is a single SKILL.md file with dependency requirements for Python, ffmpeg, and torch; no installer scripts or executable payloads are present.
Credentials
Audio libraries and local model tooling are proportionate to the advertised audio-processing purpose. No broad filesystem indexing, credential access, or network exfiltration is described.
Persistence & Privilege
No persistence mechanism, privilege escalation, background worker, account access, or credential/session handling appears in the artifacts.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install ai-audio-processing - 安装完成后,直接呼叫该 Skill 的名称或使用
/ai-audio-processing触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
AI Audio Processing Studio v1.0.0 – Initial Release
- Introduces a comprehensive AI-powered audio processing toolkit covering ASR, TTS (with emotion control), audio restoration/enhancement, music information retrieval, auto-mixing/mastering, podcast production, and real-time translation dubbing.
- Supports advanced features like multi-language transcription, speaker diarization, voice cloning, audio upscaling, instrument separation, and genre classification.
- Compatible with leading models (Whisper, Bark, OpenVoice, Demucs) and DAWs such as Ableton, Logic, and Reaper.
- Enables streamlined podcast workflow and text-based audio editing.
- Offers broad audio format compatibility for both input and output.
- Requires Python ≥3.10, ffmpeg, and torch ≥2.0.
元数据
常见问题
ai-audio-processingAI Audio Processing Studio 是什么?
AI驱动的全栈音频处理技能。覆盖语音转文字(多语言ASR)、文字转语音(TTS含情感控制)、音频降噪与修复、音乐信息检索(MIR)、自动混音与母带处理、播客制作流水线、实时翻译配音。支持Whisper/Bark/OpenVoice/Demucs等前沿模型,兼容DAW工作流(Ableton/Logic/Reaper)。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 47 次。
如何安装 ai-audio-processingAI Audio Processing Studio?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install ai-audio-processing」即可一键安装,无需额外配置。
ai-audio-processingAI Audio Processing Studio 是免费的吗?
是的,ai-audio-processingAI Audio Processing Studio 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
ai-audio-processingAI Audio Processing Studio 支持哪些平台?
ai-audio-processingAI Audio Processing Studio 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 ai-audio-processingAI Audio Processing Studio?
由 ai-gaoqian(@ai-gaoqian)开发并维护,当前版本 v1.0.0。
推荐 Skills