← 返回 Skills 市场
vincentlau2046-sudo

Whisper ASR — Speech-to-Text

作者 vincentlau2046-sudo · GitHub ↗ · v1.2.0 · MIT-0
cross-platform ⚠ pending
73
总下载
0
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install asr-funasr
功能描述
Automatic Speech Recognition using OpenAI Whisper (local GPU). Supports Chinese, English, and 90+ languages. Auto-detects language.
使用说明 (SKILL.md)

ASR — Speech-to-Text (FunASR + Whisper)

Two engines for different scenarios:

Engine Best For Chinese Quality Speed
FunASR SenseVoice (default) Chinese, Japanese, Korean ⭐⭐⭐ 简体 Fast (0.03 RTF)
OpenAI Whisper Multilingual, translation ⭐⭐ (繁体) Slower

Quick Start

# Default: FunASR SenseVoice (best Chinese)
{baseDir}/scripts/asr.py --input audio.mp3

# Whisper for multilingual / translation
{baseDir}/scripts/asr.py --input audio.mp3 --engine whisper

Options

Option Default Description
--input (required) Input audio file (mp3, wav, m4a, etc.)
--engine funasr ASR engine: funasr (SenseVoice) or whisper
--language auto Language code: zh, en, ja, ko, etc. (auto-detect if omitted)
--model base Whisper model size: tiny/base/small/medium/large (whisper only)
--task transcribe transcribe or translate (whisper only)
--output Write transcript to file (default: stdout)

Engine Details

FunASR SenseVoice-Small (Default)

  • Model: iic/SenseVoiceSmall (893MB, auto-downloaded from ModelScope)
  • Strengths: 简体中文最佳、情感识别、语音事件检测、速度极快
  • Output: 简体中文,自动去除特殊标记
  • Languages: zh, en, ja, ko, yue (Cantonese)

OpenAI Whisper

  • Model: base (139MB, auto-downloaded)
  • Strengths: 90+ languages、翻译模式、多语言场景
  • Output: 中文输出繁体字(已知问题,换 small 模型可改善)
  • Whisper model sizes:
Model VRAM Speed Accuracy
tiny ~1GB Fastest Low
base ~1GB Fast OK
small ~2GB Medium Good
medium ~5GB Slow Better
large ~10GB Slowest Best

Examples

# Chinese audio → FunASR (default, best quality)
{baseDir}/scripts/asr.py --input meeting.mp3

# Force Chinese language
{baseDir}/scripts/asr.py --input podcast.wav --language zh

# Multilingual audio → Whisper
{baseDir}/scripts/asr.py --input mixed.wav --engine whisper

# Whisper with better model
{baseDir}/scripts/asr.py --input lecture.mp3 --engine whisper --model small

# Translate Chinese speech to English text
{baseDir}/scripts/asr.py --input speech.mp3 --engine whisper --language zh --task translate

# Save transcript to file
{baseDir}/scripts/asr.py --input audio.wav --output transcript.txt

Dependencies

  • funasr + modelscope (FunASR engine)
  • openai-whisper (Whisper engine)
  • imageio-ffmpeg (bundled ffmpeg binary)
  • First run downloads model weights (auto-cached in ~/.cache/)
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install asr-funasr
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /asr-funasr 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.2.0
Add FunASR SenseVoice engine (default), better Chinese support (简体), emotion/speech event detection
v1.1.0
Add imageio-ffmpeg bundled binary, add comfyui-venv path for whisper
v1.0.0
Whisper ASR skill: local GPU transcription, 90+ languages, auto-detect
元数据
Slug asr-funasr
版本 1.2.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

Whisper ASR — Speech-to-Text 是什么?

Automatic Speech Recognition using OpenAI Whisper (local GPU). Supports Chinese, English, and 90+ languages. Auto-detects language. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 73 次。

如何安装 Whisper ASR — Speech-to-Text?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install asr-funasr」即可一键安装,无需额外配置。

Whisper ASR — Speech-to-Text 是免费的吗?

是的,Whisper ASR — Speech-to-Text 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Whisper ASR — Speech-to-Text 支持哪些平台?

Whisper ASR — Speech-to-Text 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Whisper ASR — Speech-to-Text?

由 vincentlau2046-sudo(@vincentlau2046-sudo)开发并维护,当前版本 v1.2.0。

💬 留言讨论