← 返回 Skills 市场
Asr Skill
作者
Shuai YUAN
· GitHub ↗
· v1.2.0
· MIT-0
308
总下载
0
收藏
3
当前安装
4
版本数
在 OpenClaw 中安装
/install asr-skill
功能描述
基于Qwen3-ASR-0.6B的语音转文字Skill,支持22种中文方言和多语言识别,让你可以用方言和OpenClaw交流。
使用说明 (SKILL.md)
Qwen 方言语音识别 Skill
基于通义千问Qwen3-ASR-0.6B模型的语音转文字服务,支持22种中文方言和30种语言识别,让用户可以用方言直接和OpenClaw交流。
✨ 功能特性
- 🎤 多方言支持:支持22种中文方言识别
- 🌐 多语言:支持30种国际语言
- 💻 CPU友好:无需GPU,普通服务器即可运行
- 🔍 自动检测:自动识别语言和方言类型
- ⚡ 低延迟:优化的CPU推理,接近实时响应
- 🎯 高准确率:方言识别平均准确率超过90%
- 🔌 即插即用:完美适配OpenClaw生态
🗣️ 支持的中文方言
安徽话、东北话、福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、宁夏话、山东话、陕西话、山西话、四川话、天津话、云南话、浙江话、粤语(香港口音)、粤语(广东口音)、吴语、闽南语。
🚀 快速开始
安装
在OpenClaw中搜索「Qwen方言语音识别」,点击一键安装即可。
手动安装
# 克隆项目
git clone \x3Crepository-url>
cd qwen-asr-skill
# 安装依赖
npm install
pip install -r requirements.txt
# 启动服务
npm start
环境变量配置
| 变量名 | 默认值 | 说明 |
|---|---|---|
| PORT | 3000 | 服务端口 |
| HOST | 0.0.0.0 | 监听地址 |
| MODEL_NAME | Qwen/Qwen3-ASR-0.6B | ASR模型名称 |
| DEVICE | cpu | 运行设备(cpu/cuda) |
| DTYPE | float32 | 数据类型 |
| BATCH_SIZE | 4 | 批量处理大小 |
🔧 使用方式
安装并启用后,直接在OpenClaw中发送语音消息即可,系统会自动:
- 接收语音输入
- 调用本Skill进行语音转文字
- 将识别后的文字传给大模型
- 返回语音回答给用户
你可以直接说方言,系统会自动识别,无需手动切换语言。
📡 API 接口
POST /transcribe
音频转文字接口
请求参数:
audio:音频文件或base64编码的音频数据(必需)language:指定语言/方言(可选,如:"四川话"、"粤语"等)timestamps:是否返回时间戳(可选,默认false)
响应示例:
{
"success": true,
"data": {
"text": "你好,我是四川人,今天吃火锅。",
"language": "Sichuan",
"confidence": 0.98,
"duration": 1.23
}
}
📊 性能指标
- 推理速度:实时音频的1.5-2倍速(8核CPU)
- 内存占用:6-8GB运行时
- 支持音频时长:最长5分钟
- 方言识别WER:\x3C16%(平均)
🔒 隐私保护
- 所有语音处理在本地完成,不会上传到第三方服务器
- 处理完的音频文件会自动删除,不会存储
- 不收集任何用户语音数据和识别内容
🤝 贡献
欢迎提交Issue和Pull Request来改进这个Skill!
📄 许可证
Apache-2.0 License
安全使用建议
This skill appears to do what it claims (local ASR using Qwen3-ASR). Before installing: 1) Restrict network access or run behind a firewall if you do not want automatic model downloads or remote access; the first run may download ~6GB from Hugging Face. 2) Do not expose the HTTP endpoints (/transcribe, /align, /webhook) to the public internet without adding authentication and rate limits — the webhook and APIs in the source have no auth. 3) Verify MODEL_NAME and model source you trust (avoid untrusted mirrors). 4) Monitor disk/memory usage (models are large) and adjust BATCH_SIZE, MAX_NEW_TOKENS, and thread params as needed. 5) If you require stricter privacy guarantees, confirm network egress and any telemetry from dependencies (e.g., huggingface_hub) are acceptable. Installing/running on an isolated machine or behind an API gateway is recommended.
功能分析
Type: OpenClaw Skill
Name: asr-skill
Version: 1.2.0
The skill bundle provides a legitimate implementation of a speech-to-text service using the Qwen3-ASR-0.6B model. The architecture consists of a Node.js Express server (index.js) that acts as an API gateway, interfacing with a Python inference script (asr.py) via the python-shell library. The code includes robust handling for audio inputs (supporting both file uploads via multer and base64 strings), implements automatic file cleanup after processing, and provides CPU-specific performance optimizations (cpu-optimization.py). No evidence of data exfiltration, unauthorized command execution, or malicious prompt injection was found in the code or documentation.
能力评估
Purpose & Capability
Name/description (Qwen ASR dialect recognizer) match the code and files: an Express server invoking a Python ASR backend using a Qwen3-ASR model. Required binaries (node, python3) and env vars (MODEL_NAME, DEVICE, DTYPE, PORT, HOST) are appropriate for this functionality.
Instruction Scope
SKILL.md and code limit behavior to receiving audio, running local inference, returning text, and deleting temporary files. Two operational notes: the /webhook and HTTP APIs are implemented without authentication in the provided source (no token/verification), and some environment variables referenced in code/docs (PYTHON_PATH, CACHE_DIR, ENABLE_FORCED_ALIGNER, MAX_NEW_TOKENS, BATCH_SIZE) are not listed in the 'required' metadata — callers/operator must configure them. Also the server may download model weights from Hugging Face at first run (network activity).
Install Mechanism
There is no opaque remote install URL; dependencies are standard (npm, pip). Model artifacts are fetched from Hugging Face (or a mirror if configured) which is expected for model-based skills. No extract-from-arbitrary-URL installers or shorteners are present.
Credentials
Declared required env vars are minimal and appropriate. The code and docs also reference additional optional envs (PYTHON_PATH, CACHE_DIR, HF_ENDPOINT, ENABLE_FORCED_ALIGNER, etc.) and runtime config (MAX_NEW_TOKENS, BATCH_SIZE). No secrets or third‑party API tokens are required by the skill itself, which is proportionate.
Persistence & Privilege
Skill is not forced-always or otherwise privileged. It does not modify other skills or global agent settings. It runs as a standalone service and cleans uploaded files after processing.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install asr-skill - 安装完成后,直接呼叫该 Skill 的名称或使用
/asr-skill触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.2.0
v1.3.0: 极简版发布 - 仅0.6B模型,无强制对齐功能,减少内存占用和依赖
v1.1.1
修复依赖版本:qwen-asr 版本从 0.1.0 改为 0.0.6(PyPI 上的最新版本)
v1.1.0
v1.2.0: 修复代码错误 - 移除重复代码、修正逻辑错误、完善隐私声明
v1.0.0
Qwen 方言语音识别 Skill 1.0.0 — 首次发布
- 基于 Qwen3-ASR-0.6B,支持22种中文方言及30种语言的语音转文字识别
- 提供实时、准确、CPU友好的语音识别服务,无需GPU
- 支持自动检测语音方言/语言,准确率高达90%
- 即插即用,支持OpenClaw生态、API接口开放
- 强调本地隐私保护,所有处理均在本地完成
元数据
常见问题
Asr Skill 是什么?
基于Qwen3-ASR-0.6B的语音转文字Skill,支持22种中文方言和多语言识别,让你可以用方言和OpenClaw交流。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 308 次。
如何安装 Asr Skill?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install asr-skill」即可一键安装,无需额外配置。
Asr Skill 是免费的吗?
是的,Asr Skill 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Asr Skill 支持哪些平台?
Asr Skill 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Asr Skill?
由 Shuai YUAN(@yszheda)开发并维护,当前版本 v1.2.0。
推荐 Skills