← 返回 Skills 市场
tenequm

Audio Recording Quality Analyzer

作者 Misha Kolesnik · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ✓ 安全检测通过
102
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install audio-quality-check
功能描述
Analyze audio recording quality - echo detection, loudness, speech intelligibility, SNR, spectral analysis. Use when the user wants to check a recording's qu...
使用说明 (SKILL.md)

Audio Recording Quality Analyzer

Comprehensive audio quality analysis for call recordings. Handles dual-track M4A files (system audio + mic), single-track recordings, and AEC-processed files.

Quick Start

Run the bundled analysis script on a recording directory:

python \x3Cskill-path>/scripts/analyze_recording.py "/path/to/recording/directory"

Modes for focused analysis:

python \x3Cskill-path>/scripts/analyze_recording.py /path --tracks   # track info only
python \x3Cskill-path>/scripts/analyze_recording.py /path --echo     # echo detection only
python \x3Cskill-path>/scripts/analyze_recording.py /path --quality  # quality metrics (skip echo)

For Blackbox recordings, the directory is typically: ~/Library/Application Support/Blackbox/Recordings/\x3Ctimestamp-id>/

Dependencies

System: ffmpeg, ffprobe (brew install ffmpeg) Python: numpy, soundfile, scipy, pyloudnorm, pesq, pystoi, librosa

Install all Python deps: pip3 install numpy soundfile scipy pyloudnorm pesq pystoi librosa

What Each Metric Tells You

EBU R128 Loudness (pyloudnorm)

  • What: Perceptual loudness in LUFS (Loudness Units Full Scale)
  • Target: -16 to -24 LUFS for speech
  • Watch for: AEC/post-processed tracks being significantly louder than originals (indicates the processing is amplifying without normalizing)

Echo Detection - Autocorrelation

  • What: Detects delayed copies of the signal within a single track by correlating the signal with itself at various time offsets
  • How to read: Peaks in the 20-100ms range with correlation > 0.3 indicate signal duplication. The lag tells you the delay of the duplicate copy
  • Key insight: If you see a consistent peak at the same lag across multiple time segments, that's a systematic duplication (e.g., a virtual audio processor like Krisp introducing a delayed copy at ~53ms)
  • Normal values: Peaks below 0.15 are typically speech pitch harmonics (harmless). Peaks above 0.3 at consistent lags are echo

Cross-Track Correlation

  • What: Measures how much one track's content appears in another (e.g., system audio bleeding into the mic track)
  • How to read: Values near 0 mean no bleed. Values above 0.1 indicate the mic is picking up system audio
  • Coherence: Frequency-domain version of the same test. Voice-band coherence (300-3400Hz) is most relevant for speech echo

PESQ - Speech Quality (requires reference + degraded)

  • What: ITU-T P.862 standard. Gives a MOS (Mean Opinion Score) comparing a degraded signal against a reference
  • Scale: 1.0 (bad) to 4.5 (excellent). NB = narrowband (phone quality), WB = wideband
  • Use for: Comparing AEC-processed mic vs original mic to see if processing helps or hurts
  • Thresholds: 4.0+ excellent, 3.0+ good, 2.5-3.0 fair, \x3C2.5 poor

STOI - Speech Intelligibility (requires reference + degraded)

  • What: Short-Time Objective Intelligibility. Measures how understandable speech remains after processing
  • Scale: 0.0 to 1.0
  • Thresholds: >0.8 good, >0.6 fair, \x3C0.6 poor
  • Key insight: If STOI drops significantly between original and processed, the processing is degrading intelligibility

Spectral Analysis (librosa)

  • Centroid: Average frequency weighted by amplitude. Higher = brighter/harsher audio
  • Rolloff (85%): Frequency below which 85% of spectral energy sits. Lower = more bass-heavy
  • Zero-crossing rate: How often the signal crosses zero. Higher = noisier signal. Speech is typically 0.05-0.20; values above 0.30 suggest significant noise

SNR - Signal-to-Noise Ratio

  • What: Ratio of speech energy to background noise energy (estimated via energy-based VAD)
  • Thresholds: >20dB excellent, >15dB good, >10dB fair, \x3C10dB poor
  • Note: This measures background noise, not echo. A recording can have excellent SNR but still have echo problems

Per-Minute Energy

  • What: RMS energy and voice-band energy per minute of recording
  • Use for: Spotting segments that went silent (mic cut out), got unexpectedly loud (clipping risk), or had activity patterns that help identify when speakers were active

Manual Analysis Recipes

When you need analysis beyond what the script provides, these patterns are useful.

Extract individual tracks from dual-track M4A

ffmpeg -y -i audio.m4a -map 0:0 -ac 1 -ar 16000 /tmp/system.wav
ffmpeg -y -i audio.m4a -map 0:1 -ac 1 -ar 16000 /tmp/mic.wav

Quick loudness check with sox

sox audio.wav -n stat 2>&1

Check specific time range for echo (Python)

import numpy as np
import soundfile as sf
from scipy import signal

data, sr = sf.read('/tmp/system.wav')
# Analyze 5 seconds starting at 2 minutes
start = 120 * sr
seg = data[start:start + 5*sr]
seg_norm = seg / (np.max(np.abs(seg)) + 1e-10)
autocorr = np.correlate(seg_norm, seg_norm, mode='full')
mid = len(seg_norm) - 1
autocorr = autocorr / autocorr[mid]
# Check 20-100ms range for echo peaks
min_lag = int(0.020 * sr)
max_lag = int(0.100 * sr)
region = autocorr[mid + min_lag:mid + max_lag]
peaks, props = signal.find_peaks(region, height=0.1)
for i, p in enumerate(peaks[:5]):
    lag_ms = (p + min_lag) / sr * 1000
    print(f"  Peak at {lag_ms:.1f}ms, r={props['peak_heights'][i]:.3f}")

Common Issues and What Causes Them

Symptom Likely cause What to check
Speakers sound slightly doubled/echoed Virtual audio processor (Krisp) creating delayed copy in system audio Autocorrelation: consistent peak at 40-60ms
Mic track has remote speakers' voices Acoustic echo (speakers to mic) Cross-track correlation > 0.1
AEC-processed file sounds worse DTLN-aec degrading signal quality PESQ/STOI comparing original vs processed
AEC-processed file is too loud Missing loudness normalization after processing Loudness: processed > -10 LUFS
Recording has hiss/noise Low SNR, noisy mic, or AGC artifacts SNR \x3C 15dB, high zero-crossing rate
Quiet segments mid-recording Mic cut out or device changed Per-minute energy: sudden RMS drop
安全使用建议
This skill is coherent for local audio analysis. Before running it: (1) review the included script if you have doubts; (2) ensure ffmpeg/ffprobe are installed (SKILL.md requires them even though registry metadata didn't list them); (3) install Python dependencies in a controlled environment (virtualenv/container) because packages like pesq and librosa have native dependencies; (4) only point the script at directories/files you trust—it will read any files in the provided path; and (5) if you need offline assurance, run the script in a sandboxed VM to avoid accidental access to sensitive filesystem locations. There are no signs of network exfiltration or secret access in the files provided.
功能分析
Type: OpenClaw Skill Name: audio-quality-check Version: 0.1.0 The skill bundle provides a legitimate audio quality analysis tool for call recordings. The primary script, `scripts/analyze_recording.py`, uses standard signal processing libraries (numpy, scipy, librosa) and system utilities (ffmpeg, ffprobe) to detect echo, measure loudness, and estimate speech quality. It follows secure coding practices by using list-based arguments in `subprocess.run` to prevent shell injection and utilizes `tempfile` for safe temporary data handling. No evidence of data exfiltration, persistence, or malicious prompt injection was found.
能力评估
Purpose & Capability
Name/description align with the included script and instructions: the code and docs implement echo detection, loudness, PESQ/STOI, spectral measures, and SNR. Minor inconsistency: the SKILL.md lists system dependencies (ffmpeg, ffprobe) but the registry 'Required binaries' field is empty. That is likely an oversight but worth noting.
Instruction Scope
SKILL.md and the script focus on analyzing audio files in a provided recording directory. The runtime instructions direct the agent to run the bundled Python script against a local path; the script reads files in that path and invokes ffprobe/ffmpeg. There are no instructions to collect system-wide data, read unrelated config, or transmit results to external endpoints.
Install Mechanism
No install spec (instruction-only) and one included Python script — lowest install risk. However, SKILL.md requires several third‑party Python packages (numpy, librosa, pesq, pystoi, pyloudnorm, etc.) which the user must pip-install; these are normal but increase attack surface if installed from untrusted environments. No downloads from arbitrary URLs are included.
Credentials
The skill requests no environment variables or credentials and the script does not read secrets or unrelated environment config. The abilities requested (filesystem read of user-specified recording directory and use of ffmpeg/ffprobe) are proportionate to audio analysis.
Persistence & Privilege
The skill does not request persistent or elevated privileges; always:false and no installation steps that modify other skills or global agent config. It runs only when invoked.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install audio-quality-check
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /audio-quality-check 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial publish of audio-quality-check 0.1.0. Changes: - bootstrap publish of current skill contents
元数据
Slug audio-quality-check
版本 0.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Audio Recording Quality Analyzer 是什么?

Analyze audio recording quality - echo detection, loudness, speech intelligibility, SNR, spectral analysis. Use when the user wants to check a recording's qu... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 102 次。

如何安装 Audio Recording Quality Analyzer?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install audio-quality-check」即可一键安装,无需额外配置。

Audio Recording Quality Analyzer 是免费的吗?

是的,Audio Recording Quality Analyzer 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Audio Recording Quality Analyzer 支持哪些平台?

Audio Recording Quality Analyzer 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Audio Recording Quality Analyzer?

由 Misha Kolesnik(@tenequm)开发并维护,当前版本 v0.1.0。

💬 留言讨论