← Back to Skills Marketplace
ruitao

Funasr Asr

by ruitao · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
182
Downloads
1
Stars
0
Active Installs
5
Versions
Install in OpenClaw
/install funasr-asr
Description
本地中文语音识别,使用阿里达摩院 FunASR。触发场景:(1) 用户发送语音消息需转录,(2) 需要转录音频/视频文件,(3) 从网页下载视频并转录,(4) 中文语音转文字任务。支持小内存模式(~500MB)和大模型模式(~2GB),自动内存管理,任务队列防并行。
README (SKILL.md)

FunASR 本地语音识别

阿里达摩院开源 ASR,完全本地部署,内存优化版。

快速使用

命令行

# 默认 small 模式(SenseVoiceSmall, ~500MB)
python3 scripts/transcribe.py /path/to/audio.wav

# 大模型模式(Paraformer-Large, ~2GB,中文极高精度)
python3 scripts/transcribe.py /path/to/audio.wav --mode large

# 自定义分段时长(默认 600 秒 = 10 分钟)
python3 scripts/transcribe.py /path/to/audio.wav --segment 300

# JSON 输出
python3 scripts/transcribe.py /path/to/audio.wav --format json

# 转录视频文件
python3 scripts/video-transcribe.py --audio /path/to/video.mp4

Node.js 调用

const funasr = require('./index.js');

const text = await funasr.transcribe('/path/to/audio.wav', {
  mode: 'small',    // 'small' 或 'large'
  format: 'text'    // 'text' 或 'json'
});

核心优化(v2.0)

优化项 旧方案 新方案
进程模型 每段新进程 单进程,模型加载一次
默认模型 paraformer (~2GB) SenseVoiceSmall (~500MB)
内存峰值 ~2GB × N 次 ~500MB 常驻
段间释放 进程退出 gc.collect()
任务锁 ✅ 保留

模式对比

特性 small (默认) large
模型 SenseVoiceSmall Paraformer-Large
内存 ~500MB ~2GB
语言 中英日韩粤 中文
精度 极高
速度 快 (~0.1 RTF) 较慢 (~0.3 RTF)
适用场景 日常转录 专业中文场景

安装

# Python 依赖
pip install funasr onnxruntime psutil

# 视频处理(可选)
pip install yt-dlp
apt install ffmpeg

# 首次运行自动下载模型
# small: ~500MB | large: ~2GB

支持格式

  • 音频: WAV, MP3, FLAC, M4A(自动转 16kHz mono)
  • 视频: MP4, WebM, AVI, MOV

故障排查

问题 解决
模型下载失败 pip install modelscope && modelscope download --model iic/SenseVoiceSmall
内存不足 (OOM) --mode small --segment 300 减小分段
音频格式错误 ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
转录有奇怪标记 脚本自动清理 `\x3C

许可证

MIT License

Usage Guidance
This skill appears to be what it claims: a local FunASR-based transcriber. Before installing, consider the following: (1) it requires Python and pip and will install funasr and onnxruntime (PyPI packages) — standard but subject to supply-chain risks; (2) first run will download model files (~500MB to ~2GB) into your model cache (~~/.cache/modelscope/), so ensure disk and bandwidth; (3) video transcription will fetch arbitrary web URLs (yt-dlp/requests/ffmpeg) — avoid supplying internal/private URLs you don't want the runtime to contact; (4) scripts spawn subprocesses (ffmpeg, yt-dlp, python) and write temporary files under /tmp and a model cache — run in an environment with appropriate resource limits or isolation if you are concerned; (5) review or pin the exact Python packages you install if you need stronger supply-chain guarantees. If you want, ask for a list of the exact network endpoints used for model downloads or for an explanation of the expected files the skill will create and where (model cache, temp dirs, lock files).
Capability Analysis
Type: OpenClaw Skill Name: funasr-asr Version: 2.0.0 The skill provides local speech-to-text functionality using the FunASR library, but it contains a shell injection vulnerability in `index.js`. The `audioPath` parameter is concatenated into a command string executed via `child_process.exec` with only double-quote wrapping, which can be bypassed to execute arbitrary commands. Additionally, `scripts/video-transcribe.py` includes capabilities to download and process content from arbitrary URLs using `yt-dlp` and `ffmpeg`. While these features align with the stated purpose of transcribing videos, the lack of robust input sanitization for file paths and URLs poses a significant security risk.
Capability Tags
crypto
Capability Assessment
Purpose & Capability
The name/description (local FunASR ASR) match the code and docs: Python scripts load FunASR models, transcribe audio, extract audio from video, and optionally download videos. Dependencies (funasr, onnxruntime, yt-dlp/ffmpeg) and model downloads are proportional to the stated purpose.
Instruction Scope
SKILL.md and code instruct running local Python scripts, installing Python packages, and optionally downloading videos via yt-dlp or requests. The scripts read predictable local paths (/tmp locks, ~/.cache/modelscope), inspect memory (/proc/meminfo or psutil), and do not attempt to read arbitrary unrelated system configuration or secrets. Note: processing a video URL will fetch external web content (user-supplied) and invoke yt-dlp/ffmpeg, which is expected for the feature.
Install Mechanism
No registry install spec was provided, but package.json runs scripts/install.sh which pip-installs funasr and onnxruntime. There are no downloads from untrusted shorteners or personal IP addresses; model acquisition is expected to occur via modelscope/ModelScope tooling or the funasr library. This is a standard pip-based install (traceable but subject to normal PyPI supply-chain risks).
Credentials
The skill declares no required environment variables or credentials. Documentation mentions optional MODELSCOPE_CACHE and common OS tools but does not require unrelated secrets. File reads/writes are limited to model cache and temporary working directories.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and only creates per-skill temporary files and locks under /tmp; it does not try to persist or escalate privileges beyond its own runtime files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install funasr-asr
  3. After installation, invoke the skill by name or use /funasr-asr
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
v2.0.0: 内存优化版 - 单进程分段转录 + SenseVoiceSmall 默认模型 核心变更: - 单进程分段转录:模型只加载一次(~500MB),逐段处理 - 默认模型改为 SenseVoiceSmall(~500MB vs paraformer ~2GB) - 新增 --segment 参数控制分段时长 - 段间 gc.collect() 释放临时内存 - 自动清理 SenseVoice 特殊标记 - 修复 agent-browser eval 传参 bug - 修复音频采样率 8000→16000
v1.2.2
Description 改为中英文对照格式
v1.2.1
Display name 和 description 改为中英文对照格式
v1.2.0
中英文对照说明,添加多语言支持信息(中英日韩粤)
v1.1.0
优化目录结构,符合 ClawHub 规范
Metadata
Slug funasr-asr
Version 2.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 5
Frequently Asked Questions

What is Funasr Asr?

本地中文语音识别,使用阿里达摩院 FunASR。触发场景:(1) 用户发送语音消息需转录,(2) 需要转录音频/视频文件,(3) 从网页下载视频并转录,(4) 中文语音转文字任务。支持小内存模式(~500MB)和大模型模式(~2GB),自动内存管理,任务队列防并行。 It is an AI Agent Skill for Claude Code / OpenClaw, with 182 downloads so far.

How do I install Funasr Asr?

Run "/install funasr-asr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Funasr Asr free?

Yes, Funasr Asr is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Funasr Asr support?

Funasr Asr is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Funasr Asr?

It is built and maintained by ruitao (@ruitao); the current version is v2.0.0.

💬 Comments