/install asr-skills
ASR Transcription Skill
Provide local audio/video transcription with speaker diarization, multiple output formats, and progress indication.
Purpose
Enable users to transcribe audio and video files to text with automatic speaker identification, supporting multiple subtitle formats while preserving privacy through local processing.
When to Use
This skill triggers when the user:
- Wants to transcribe an audio file (MP3, WAV, M4A, FLAC)
- Wants to transcribe a video file (MP4, AVI, MKV)
- Needs subtitles or captions generated from media
- Wants to identify different speakers in audio
- Needs timestamped transcription output
Quick Start
Basic Transcription
# Transcribe audio file (outputs TXT by default)
python3 skills/asr/scripts/transcribe.py path/to/audio.mp3
# Transcribe video file
python3 skills/asr/scripts/transcribe.py path/to/video.mp4
Output Formats
python3 skills/asr/scripts/transcribe.py audio.mp3 -f json # Structured JSON with metadata
python3 skills/asr/scripts/transcribe.py audio.mp3 -f srt # SubRip subtitles
python3 skills/asr/scripts/transcribe.py audio.mp3 -f ass # ASS/SSA subtitles with speaker styling
python3 skills/asr/scripts/transcribe.py audio.mp3 -f md # Markdown with speaker sections
Python API
from asr_skill import transcribe
result = transcribe("meeting.mp4", format="srt")
print(f"Output: {result['output_path']}")
print(f"Speakers: {result.get('speakers', [])}")
Asynchronous Execution (Recommended for Long Files)
Avoid timeouts by running transcription in the background:
# Start async task
python3 skills/asr/scripts/transcribe.py long_video.mp4 --async
# Output: {"task_id": "a1b2c3d4", "status": "queued", ...}
# Check status
python3 skills/asr/scripts/transcribe.py --status a1b2c3d4
# Output: {"task_id": "a1b2c3d4", "status": "processing", "progress": 45, ...}
# List recent tasks
python3 skills/asr/scripts/transcribe.py --list
Core Features
Speaker Diarization
Automatically identifies and labels different speakers:
- Speaker A, Speaker B, Speaker C, etc.
- Per-segment timestamps
- Overlap detection marked with [OVERLAP]
Hardware Auto-Detection
Detects and uses the best available hardware:
- CUDA GPU (NVIDIA)
- Apple MPS (Apple Silicon)
- CPU fallback with notification
Long Audio Support
Handles audio files longer than 1 hour:
- VAD-based intelligent segmentation
- Memory-efficient processing
- Progress indication during transcription
Multiple Output Formats
| Format | Extension | Use Case |
|---|---|---|
| txt | .txt | Plain text with timestamps |
| json | .json | Structured data with word-level info |
| srt | .srt | Video subtitles |
| ass | .ass | Styled subtitles |
| md | .md | Documentation with speaker sections |
Implementation Details
Processing Pipeline
- Input validation - Check file exists and format supported
- Hardware detection - Auto-detect GPU/MPS/CPU
- Video extraction - Extract audio from video files via FFmpeg
- Audio preprocessing - Resample to 16kHz mono
- Model loading - Load FunASR models (cached locally)
- Transcription - Run ASR with speaker diarization
- Formatting - Output in requested format
- Cleanup - Remove temporary files
Model Components
- ASR Model: Paraformer-large (Chinese optimized)
- VAD Model: FSMN-VAD (voice activity detection)
- Punctuation: CT-Transformer
- Speaker: CAM++ (speaker diarization)
File Locations
- Models cached in:
./models/ - Output defaults to: same directory as input
- Temp files: auto-cleaned after processing
Troubleshooting
Common Issues
"FFmpeg not found"
- FFmpeg auto-installed via imageio-ffmpeg
- Check internet connection for first run
"CUDA out of memory"
- System falls back to CPU automatically
- Try shorter audio segments
"No speakers detected"
- Speaker diarization requires multi-speaker audio
- Single speaker audio shows "Speaker A" only
Additional Resources
Reference Files
For detailed format specifications:
references/output-formats.md- Complete format documentation
Scripts
Utility scripts for batch processing:
scripts/transcribe.py- Batch transcription script
Examples
Working examples:
examples/basic_usage.py- Python API examplesexamples/cli_examples.sh- CLI usage examples
Requirements
- Python >= 3.10
- FunASR (auto-installed)
- FFmpeg (auto-installed via imageio-ffmpeg for video)
Notes
- First run downloads models (~1GB total)
- All processing happens locally for privacy
- Chinese language optimized for v1
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install asr-skills - 安装完成后,直接呼叫该 Skill 的名称或使用
/asr-skills触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
asr-skill 是什么?
This skill should be used when the user asks to "transcribe audio", "transcribe video", "convert speech to text", "generate subtitles", "create captions", "i... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 144 次。
如何安装 asr-skill?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install asr-skills」即可一键安装,无需额外配置。
asr-skill 是免费的吗?
是的,asr-skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
asr-skill 支持哪些平台?
asr-skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 asr-skill?
由 lgwanai(@lgwanai)开发并维护,当前版本 v1.0.0。