← Back to Skills Marketplace

Qwen Asr Skill

Name: Qwen Asr Skill
Author: yszheda

by Shuai YUAN · GitHub ↗ · v1.3.0 · MIT-0

cross-platform ✓ Security Clean

355

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install qwen-asr-skill

Description

Provides high-accuracy speech-to-text conversion supporting 22 Chinese dialects and 30 languages with automatic language detection, running on CPU.

Usage Guidance

This skill appears coherent with its stated purpose, but review and precautions are recommended before installing: 1) The skill will download ~6GB of model weights from Hugging Face on first run — ensure you have disk, CPU, and bandwidth capacity. 2) It runs a local web server that accepts file uploads; uploaded audio is stored temporarily in an uploads/ directory and then deleted — run in an isolated environment or container if you are cautious. 3) The code uses dotenv and standard environment variables — check any local .env before starting to avoid unintentionally exposing secrets or changing behavior (and be aware that a private HF model would require an HF token not declared in the SKILL.md). 4) The repository origin is listed as a placeholder in SKILL.md; verify the source URL and review the code before deployment. If you want higher assurance, run the skill inside a sandbox (container/VM) and inspect network traffic during first model download.

Capability Analysis

Type: OpenClaw Skill Name: qwen-asr-skill Version: 1.3.0 The skill bundle is a legitimate implementation of an Automated Speech Recognition (ASR) service using the Qwen3-ASR-0.6B model. The code consists of an Express.js server (index.js) that interfaces with a Python inference script (asr.py) via the python-shell library. It includes standard features such as file upload handling via multer, CPU performance optimizations (cpu-optimization.py), and dialect mapping (dialect-map.js). No evidence of data exfiltration, malicious execution, backdoors, or prompt injection was found; the logic is consistent with the stated purpose of providing local speech-to-text capabilities.

Capability Assessment

✓ Purpose & Capability

Name/description (Qwen ASR, CPU-side dialect support) matches the code: a Node.js HTTP wrapper that invokes a Python asr.py using qwen-asr and torch. Declared package dependencies and Python requirements align with running a local ASR model. No unrelated cloud credentials or surprising binaries are requested.

ℹ Instruction Scope

SKILL.md instructions (git clone, npm install, pip install, npm start) map to the provided Node + Python code. The skill runs a local web server, accepts uploaded audio or base64, invokes the Python script, and deletes uploaded files. The SKILL.md claims model weights are downloaded from Hugging Face on first run — code uses from_pretrained which will perform that download. There is no instruction to read unrelated user files, but the code does load environment variables (via dotenv) if present.

✓ Install Mechanism

There is no automatic download/install spec in the registry; the README asks the user to run npm and pip installs. Model weights are retrieved at runtime via Hugging Face from_pretrained (a common, known source). No obscure download URLs or archive extraction from untrusted hosts were found in the code.

ℹ Credentials

The skill declares no required env vars or credentials. It reads standard process.env values (MODEL_NAME, DEVICE, CACHE_DIR, PYTHON_PATH, etc.) and uses dotenv if a .env file exists — reasonable for configuration but means a local .env could alter behavior. If the chosen model is private, Hugging Face authentication (HF_TOKEN) might be needed even though it isn't declared. No requests for unrelated service credentials were found.

✓ Persistence & Privilege

always:false and user-invocable:true. The skill runs as a local web service and does not request persistent platform-wide privileges or modify other skills. It executes a local Python script (asr.py) via python-shell, which is expected for this design.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install qwen-asr-skill
After installation, invoke the skill by name or use /qwen-asr-skill
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.3.0

v1.3.0: 极简版发布 - 仅0.6B模型，无强制对齐功能，减少内存占用和依赖

v1.2.1

修复依赖版本：qwen-asr 版本从 0.1.0 改为 0.0.6（PyPI 上的最新版本）

v1.2.0

v1.2.0: 修复代码错误 - 移除重复代码、修正逻辑错误、完善隐私声明

v1.1.0

v1.1.0: 修复安全问题 - 移除硬编码凭证

v1.0.0

Initial release of Qwen 方言语音识别: - Provides speech-to-text conversion using the Qwen3-ASR-0.6B model. - Supports 22 major Chinese dialects and 30 international languages. - Runs on CPU without GPU requirements. - Features automatic language detection and low-latency performance. - Offers API endpoint for audio transcription with support for language selection and timestamps.

Metadata

Slug qwen-asr-skill

Version 1.3.0

License MIT-0

All-time Installs 2

Active Installs 2

Total Versions 5

Frequently Asked Questions

What is Qwen Asr Skill?

Provides high-accuracy speech-to-text conversion supporting 22 Chinese dialects and 30 languages with automatic language detection, running on CPU. It is an AI Agent Skill for Claude Code / OpenClaw, with 355 downloads so far.

How do I install Qwen Asr Skill?

Run "/install qwen-asr-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Qwen Asr Skill free?

Yes, Qwen Asr Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Qwen Asr Skill support?

Qwen Asr Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Qwen Asr Skill?

It is built and maintained by Shuai YUAN (@yszheda); the current version is v1.3.0.

More Skills