← 返回 Skills 市场
lgwanai

asr-skill

作者 lgwanai · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
144
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install asr-skills
功能描述
This skill should be used when the user asks to "transcribe audio", "transcribe video", "convert speech to text", "generate subtitles", "create captions", "i...
使用说明 (SKILL.md)

ASR Transcription Skill

Provide local audio/video transcription with speaker diarization, multiple output formats, and progress indication.

Purpose

Enable users to transcribe audio and video files to text with automatic speaker identification, supporting multiple subtitle formats while preserving privacy through local processing.

When to Use

This skill triggers when the user:

  • Wants to transcribe an audio file (MP3, WAV, M4A, FLAC)
  • Wants to transcribe a video file (MP4, AVI, MKV)
  • Needs subtitles or captions generated from media
  • Wants to identify different speakers in audio
  • Needs timestamped transcription output

Quick Start

Basic Transcription

# Transcribe audio file (outputs TXT by default)
python3 skills/asr/scripts/transcribe.py path/to/audio.mp3

# Transcribe video file
python3 skills/asr/scripts/transcribe.py path/to/video.mp4

Output Formats

python3 skills/asr/scripts/transcribe.py audio.mp3 -f json   # Structured JSON with metadata
python3 skills/asr/scripts/transcribe.py audio.mp3 -f srt    # SubRip subtitles
python3 skills/asr/scripts/transcribe.py audio.mp3 -f ass    # ASS/SSA subtitles with speaker styling
python3 skills/asr/scripts/transcribe.py audio.mp3 -f md     # Markdown with speaker sections

Python API

from asr_skill import transcribe

result = transcribe("meeting.mp4", format="srt")
print(f"Output: {result['output_path']}")
print(f"Speakers: {result.get('speakers', [])}")

Asynchronous Execution (Recommended for Long Files)

Avoid timeouts by running transcription in the background:

# Start async task
python3 skills/asr/scripts/transcribe.py long_video.mp4 --async
# Output: {"task_id": "a1b2c3d4", "status": "queued", ...}

# Check status
python3 skills/asr/scripts/transcribe.py --status a1b2c3d4
# Output: {"task_id": "a1b2c3d4", "status": "processing", "progress": 45, ...}

# List recent tasks
python3 skills/asr/scripts/transcribe.py --list

Core Features

Speaker Diarization

Automatically identifies and labels different speakers:

  • Speaker A, Speaker B, Speaker C, etc.
  • Per-segment timestamps
  • Overlap detection marked with [OVERLAP]

Hardware Auto-Detection

Detects and uses the best available hardware:

  • CUDA GPU (NVIDIA)
  • Apple MPS (Apple Silicon)
  • CPU fallback with notification

Long Audio Support

Handles audio files longer than 1 hour:

  • VAD-based intelligent segmentation
  • Memory-efficient processing
  • Progress indication during transcription

Multiple Output Formats

Format Extension Use Case
txt .txt Plain text with timestamps
json .json Structured data with word-level info
srt .srt Video subtitles
ass .ass Styled subtitles
md .md Documentation with speaker sections

Implementation Details

Processing Pipeline

  1. Input validation - Check file exists and format supported
  2. Hardware detection - Auto-detect GPU/MPS/CPU
  3. Video extraction - Extract audio from video files via FFmpeg
  4. Audio preprocessing - Resample to 16kHz mono
  5. Model loading - Load FunASR models (cached locally)
  6. Transcription - Run ASR with speaker diarization
  7. Formatting - Output in requested format
  8. Cleanup - Remove temporary files

Model Components

  • ASR Model: Paraformer-large (Chinese optimized)
  • VAD Model: FSMN-VAD (voice activity detection)
  • Punctuation: CT-Transformer
  • Speaker: CAM++ (speaker diarization)

File Locations

  • Models cached in: ./models/
  • Output defaults to: same directory as input
  • Temp files: auto-cleaned after processing

Troubleshooting

Common Issues

"FFmpeg not found"

  • FFmpeg auto-installed via imageio-ffmpeg
  • Check internet connection for first run

"CUDA out of memory"

  • System falls back to CPU automatically
  • Try shorter audio segments

"No speakers detected"

  • Speaker diarization requires multi-speaker audio
  • Single speaker audio shows "Speaker A" only

Additional Resources

Reference Files

For detailed format specifications:

  • references/output-formats.md - Complete format documentation

Scripts

Utility scripts for batch processing:

  • scripts/transcribe.py - Batch transcription script

Examples

Working examples:

  • examples/basic_usage.py - Python API examples
  • examples/cli_examples.sh - CLI usage examples

Requirements

  • Python >= 3.10
  • FunASR (auto-installed)
  • FFmpeg (auto-installed via imageio-ffmpeg for video)

Notes

  • First run downloads models (~1GB total)
  • All processing happens locally for privacy
  • Chinese language optimized for v1
安全使用建议
This skill appears to implement what it claims (local transcription with diarization) and doesn't request secrets. Before installing or running it: 1) Inspect the asr_skill package (not included in the reviewed files) to see where and from which hosts models are downloaded and whether any network calls are made; 2) Be aware the script will create a hidden .asr_skill/tasks.json in the project root and will spawn detached background worker processes (their stdout/stderr are suppressed), so run in an isolated environment if you need to restrict resource or process creation; 3) Ensure dependencies (FunASR, imageio-ffmpeg/FFmpeg, librosa, etc.) are installed from trusted sources or pinned to known versions; 4) If privacy is critical, confirm model downloads occur from trusted hosts and consider auditing network activity on first run. If you want higher assurance, ask the publisher for the full asr_skill package source or run the skill in a sandboxed environment first.
功能分析
Type: OpenClaw Skill Name: asr-skills Version: 1.0.0 The skill provides local audio and video transcription using the FunASR library. The provided scripts (scripts/transcribe.py) and documentation (SKILL.md) describe a legitimate tool for media processing, including a task management system for background execution via subprocess. No indicators of data exfiltration, malicious execution, or harmful prompt injection were found.
能力评估
Purpose & Capability
Name, description, SKILL.md, examples, and scripts consistently describe a local ASR transcription tool with speaker diarization and multiple output formats. The code present (transcribe.py and examples) implements task management, async/background execution, duration estimation, and calls into an asr_skill Python API for the heavy work, which fits the stated purpose.
Instruction Scope
Instructions and code operate on local files (audio/video), extract audio with ffprobe/ffmpeg, resample, load models, and write outputs. The script creates a .asr_skill directory and tasks.json in the project root to track async jobs and spawns a detached background worker (subprocess.Popen with start_new_session=True and stdout/stderr redirected to DEVNULL). There is no evidence in the provided files of external network exfiltration, but the SKILL.md notes first-run model downloads (~1GB) and those download routines are not present in the reviewed files (they likely live in the missing asr_skill package), so you should verify where models are fetched from if local-only processing and privacy are essential.
Install Mechanism
There is no install spec (instruction-only), which minimizes installer risk. However, SKILL.md claims auto-install of FunASR and FFmpeg via imageio-ffmpeg and that models are downloaded on first run; those actions are not in the shown script and would occur in imported modules (not included here). Because downloads/extracts are plausible but not visible, verify the code that performs automatic installs/downloads before running.
Credentials
The skill declares no required environment variables, credentials, or config paths and the provided code does not attempt to read secrets or unrelated system configuration. It uses local filesystem storage for tasks and relies on available binaries (ffprobe) or Python libraries (librosa), which is proportional to the stated functionality.
Persistence & Privilege
The skill does not set always:true and does not modify other skills. It does persist local state by creating a .asr_skill directory and tasks.json under the project root and spawns detached background worker processes for async jobs. These behaviors are expected for async transcription but can create persistent files and background processes that you may want to control or audit.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install asr-skills
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /asr-skills 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of local ASR transcription skill with speaker diarization. - Transcribes audio and video files with automatic speaker identification using FunASR. - Supports multiple output formats: TXT, JSON, SRT, ASS/SSA, Markdown. - Provides hardware auto-detection for GPU/MPS/CPU, and handles long files efficiently. - Offers both CLI and Python API usage with asynchronous execution for long tasks. - Ensures privacy by processing all files locally; models download on first use.
元数据
Slug asr-skills
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

asr-skill 是什么?

This skill should be used when the user asks to "transcribe audio", "transcribe video", "convert speech to text", "generate subtitles", "create captions", "i... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 144 次。

如何安装 asr-skill?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install asr-skills」即可一键安装,无需额外配置。

asr-skill 是免费的吗?

是的,asr-skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

asr-skill 支持哪些平台?

asr-skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 asr-skill?

由 lgwanai(@lgwanai)开发并维护,当前版本 v1.0.0。

💬 留言讨论