← 返回 Skills 市场

Asr Skill

Name: Asr Skill
Author: yszheda

作者 Shuai YUAN · GitHub ↗ · v1.2.0 · MIT-0

cross-platform ✓ 安全检测通过

308

总下载

当前安装

版本数

在 OpenClaw 中安装

/install asr-skill

功能描述

基于Qwen3-ASR-0.6B的语音转文字Skill，支持22种中文方言和多语言识别，让你可以用方言和OpenClaw交流。

使用说明 (SKILL.md)

Qwen 方言语音识别 Skill

基于通义千问Qwen3-ASR-0.6B模型的语音转文字服务，支持22种中文方言和30种语言识别，让用户可以用方言直接和OpenClaw交流。

✨ 功能特性

🎤 多方言支持：支持22种中文方言识别
🌐 多语言：支持30种国际语言
💻 CPU友好：无需GPU，普通服务器即可运行
🔍 自动检测：自动识别语言和方言类型
⚡ 低延迟：优化的CPU推理，接近实时响应
🎯 高准确率：方言识别平均准确率超过90%
🔌 即插即用：完美适配OpenClaw生态

🗣️ 支持的中文方言

安徽话、东北话、福建话、甘肃话、贵州话、河北话、河南话、湖北话、湖南话、江西话、宁夏话、山东话、陕西话、山西话、四川话、天津话、云南话、浙江话、粤语（香港口音）、粤语（广东口音）、吴语、闽南语。

🚀 快速开始

安装

在OpenClaw中搜索「Qwen方言语音识别」，点击一键安装即可。

手动安装

# 克隆项目
git clone \x3Crepository-url>
cd qwen-asr-skill

# 安装依赖
npm install
pip install -r requirements.txt

# 启动服务
npm start

环境变量配置

变量名	默认值	说明
PORT	3000	服务端口
HOST	0.0.0.0	监听地址
MODEL_NAME	Qwen/Qwen3-ASR-0.6B	ASR模型名称
DEVICE	cpu	运行设备（cpu/cuda）
DTYPE	float32	数据类型
BATCH_SIZE	4	批量处理大小

🔧 使用方式

安装并启用后，直接在OpenClaw中发送语音消息即可，系统会自动：

接收语音输入
调用本Skill进行语音转文字
将识别后的文字传给大模型
返回语音回答给用户

你可以直接说方言，系统会自动识别，无需手动切换语言。

📡 API 接口

POST /transcribe

音频转文字接口

请求参数：

audio：音频文件或base64编码的音频数据（必需）
language：指定语言/方言（可选，如："四川话"、"粤语"等）
timestamps：是否返回时间戳（可选，默认false）

响应示例：

{
  "success": true,
  "data": {
    "text": "你好，我是四川人，今天吃火锅。",
    "language": "Sichuan",
    "confidence": 0.98,
    "duration": 1.23
  }
}

📊 性能指标

推理速度：实时音频的1.5-2倍速（8核CPU）
内存占用：6-8GB运行时
支持音频时长：最长5分钟
方言识别WER：\x3C16%（平均）

🔒 隐私保护

所有语音处理在本地完成，不会上传到第三方服务器
处理完的音频文件会自动删除，不会存储
不收集任何用户语音数据和识别内容

🤝 贡献

欢迎提交Issue和Pull Request来改进这个Skill！

📄 许可证

Apache-2.0 License

安全使用建议

This skill appears to do what it claims (local ASR using Qwen3-ASR). Before installing: 1) Restrict network access or run behind a firewall if you do not want automatic model downloads or remote access; the first run may download ~6GB from Hugging Face. 2) Do not expose the HTTP endpoints (/transcribe, /align, /webhook) to the public internet without adding authentication and rate limits — the webhook and APIs in the source have no auth. 3) Verify MODEL_NAME and model source you trust (avoid untrusted mirrors). 4) Monitor disk/memory usage (models are large) and adjust BATCH_SIZE, MAX_NEW_TOKENS, and thread params as needed. 5) If you require stricter privacy guarantees, confirm network egress and any telemetry from dependencies (e.g., huggingface_hub) are acceptable. Installing/running on an isolated machine or behind an API gateway is recommended.

功能分析

Type: OpenClaw Skill Name: asr-skill Version: 1.2.0 The skill bundle provides a legitimate implementation of a speech-to-text service using the Qwen3-ASR-0.6B model. The architecture consists of a Node.js Express server (index.js) that acts as an API gateway, interfacing with a Python inference script (asr.py) via the python-shell library. The code includes robust handling for audio inputs (supporting both file uploads via multer and base64 strings), implements automatic file cleanup after processing, and provides CPU-specific performance optimizations (cpu-optimization.py). No evidence of data exfiltration, unauthorized command execution, or malicious prompt injection was found in the code or documentation.

能力评估

✓ Purpose & Capability

Name/description (Qwen ASR dialect recognizer) match the code and files: an Express server invoking a Python ASR backend using a Qwen3-ASR model. Required binaries (node, python3) and env vars (MODEL_NAME, DEVICE, DTYPE, PORT, HOST) are appropriate for this functionality.

ℹ Instruction Scope

SKILL.md and code limit behavior to receiving audio, running local inference, returning text, and deleting temporary files. Two operational notes: the /webhook and HTTP APIs are implemented without authentication in the provided source (no token/verification), and some environment variables referenced in code/docs (PYTHON_PATH, CACHE_DIR, ENABLE_FORCED_ALIGNER, MAX_NEW_TOKENS, BATCH_SIZE) are not listed in the 'required' metadata — callers/operator must configure them. Also the server may download model weights from Hugging Face at first run (network activity).

✓ Install Mechanism

There is no opaque remote install URL; dependencies are standard (npm, pip). Model artifacts are fetched from Hugging Face (or a mirror if configured) which is expected for model-based skills. No extract-from-arbitrary-URL installers or shorteners are present.

ℹ Credentials

Declared required env vars are minimal and appropriate. The code and docs also reference additional optional envs (PYTHON_PATH, CACHE_DIR, HF_ENDPOINT, ENABLE_FORCED_ALIGNER, etc.) and runtime config (MAX_NEW_TOKENS, BATCH_SIZE). No secrets or third‑party API tokens are required by the skill itself, which is proportionate.

✓ Persistence & Privilege

Skill is not forced-always or otherwise privileged. It does not modify other skills or global agent settings. It runs as a standalone service and cleans uploaded files after processing.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install asr-skill
安装完成后，直接呼叫该 Skill 的名称或使用 /asr-skill 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.2.0

v1.3.0: 极简版发布 - 仅0.6B模型，无强制对齐功能，减少内存占用和依赖

v1.1.1

修复依赖版本：qwen-asr 版本从 0.1.0 改为 0.0.6（PyPI 上的最新版本）

v1.1.0

v1.2.0: 修复代码错误 - 移除重复代码、修正逻辑错误、完善隐私声明

v1.0.0

Qwen 方言语音识别 Skill 1.0.0 — 首次发布 - 基于 Qwen3-ASR-0.6B，支持22种中文方言及30种语言的语音转文字识别 - 提供实时、准确、CPU友好的语音识别服务，无需GPU - 支持自动检测语音方言/语言，准确率高达90% - 即插即用，支持OpenClaw生态、API接口开放 - 强调本地隐私保护，所有处理均在本地完成

元数据

Slug asr-skill

版本 1.2.0

许可证 MIT-0

累计安装 3

当前安装数 3

历史版本数 4

常见问题

Asr Skill 是什么？

基于Qwen3-ASR-0.6B的语音转文字Skill，支持22种中文方言和多语言识别，让你可以用方言和OpenClaw交流。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 308 次。

如何安装 Asr Skill？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install asr-skill」即可一键安装，无需额外配置。

Asr Skill 是免费的吗？

是的，Asr Skill 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Asr Skill 支持哪些平台？

Asr Skill 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Asr Skill？

由 Shuai YUAN（@yszheda）开发并维护，当前版本 v1.2.0。