← Back to Skills Marketplace
Whisper ASR — Speech-to-Text
by
vincentlau2046-sudo
· GitHub ↗
· v1.2.0
· MIT-0
73
Downloads
0
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install asr-funasr
Description
Automatic Speech Recognition using OpenAI Whisper (local GPU). Supports Chinese, English, and 90+ languages. Auto-detects language.
README (SKILL.md)
ASR — Speech-to-Text (FunASR + Whisper)
Two engines for different scenarios:
| Engine | Best For | Chinese Quality | Speed |
|---|---|---|---|
| FunASR SenseVoice (default) | Chinese, Japanese, Korean | ⭐⭐⭐ 简体 | Fast (0.03 RTF) |
| OpenAI Whisper | Multilingual, translation | ⭐⭐ (繁体) | Slower |
Quick Start
# Default: FunASR SenseVoice (best Chinese)
{baseDir}/scripts/asr.py --input audio.mp3
# Whisper for multilingual / translation
{baseDir}/scripts/asr.py --input audio.mp3 --engine whisper
Options
| Option | Default | Description |
|---|---|---|
--input |
(required) | Input audio file (mp3, wav, m4a, etc.) |
--engine |
funasr | ASR engine: funasr (SenseVoice) or whisper |
--language |
auto | Language code: zh, en, ja, ko, etc. (auto-detect if omitted) |
--model |
base | Whisper model size: tiny/base/small/medium/large (whisper only) |
--task |
transcribe | transcribe or translate (whisper only) |
--output |
Write transcript to file (default: stdout) |
Engine Details
FunASR SenseVoice-Small (Default)
- Model:
iic/SenseVoiceSmall(893MB, auto-downloaded from ModelScope) - Strengths: 简体中文最佳、情感识别、语音事件检测、速度极快
- Output: 简体中文,自动去除特殊标记
- Languages: zh, en, ja, ko, yue (Cantonese)
OpenAI Whisper
- Model: base (139MB, auto-downloaded)
- Strengths: 90+ languages、翻译模式、多语言场景
- Output: 中文输出繁体字(已知问题,换 small 模型可改善)
- Whisper model sizes:
| Model | VRAM | Speed | Accuracy |
|---|---|---|---|
| tiny | ~1GB | Fastest | Low |
| base | ~1GB | Fast | OK |
| small | ~2GB | Medium | Good |
| medium | ~5GB | Slow | Better |
| large | ~10GB | Slowest | Best |
Examples
# Chinese audio → FunASR (default, best quality)
{baseDir}/scripts/asr.py --input meeting.mp3
# Force Chinese language
{baseDir}/scripts/asr.py --input podcast.wav --language zh
# Multilingual audio → Whisper
{baseDir}/scripts/asr.py --input mixed.wav --engine whisper
# Whisper with better model
{baseDir}/scripts/asr.py --input lecture.mp3 --engine whisper --model small
# Translate Chinese speech to English text
{baseDir}/scripts/asr.py --input speech.mp3 --engine whisper --language zh --task translate
# Save transcript to file
{baseDir}/scripts/asr.py --input audio.wav --output transcript.txt
Dependencies
funasr+modelscope(FunASR engine)openai-whisper(Whisper engine)imageio-ffmpeg(bundled ffmpeg binary)- First run downloads model weights (auto-cached in
~/.cache/)
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install asr-funasr - After installation, invoke the skill by name or use
/asr-funasr - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.2.0
Add FunASR SenseVoice engine (default), better Chinese support (简体), emotion/speech event detection
v1.1.0
Add imageio-ffmpeg bundled binary, add comfyui-venv path for whisper
v1.0.0
Whisper ASR skill: local GPU transcription, 90+ languages, auto-detect
Metadata
Frequently Asked Questions
What is Whisper ASR — Speech-to-Text?
Automatic Speech Recognition using OpenAI Whisper (local GPU). Supports Chinese, English, and 90+ languages. Auto-detects language. It is an AI Agent Skill for Claude Code / OpenClaw, with 73 downloads so far.
How do I install Whisper ASR — Speech-to-Text?
Run "/install asr-funasr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Whisper ASR — Speech-to-Text free?
Yes, Whisper ASR — Speech-to-Text is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Whisper ASR — Speech-to-Text support?
Whisper ASR — Speech-to-Text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Whisper ASR — Speech-to-Text?
It is built and maintained by vincentlau2046-sudo (@vincentlau2046-sudo); the current version is v1.2.0.
More Skills