/install local-qwen3-asr-aipc
\r \r
Local Speech Recognition (Windows · Qwen3-ASR · OpenVINO)\r
\r
Model: snake7gun/Qwen3-ASR-0.6B-fp16-ov (ModelScope FP16) \r
SKILL_VERSION: 'v1.0.3'\r
\r
First time? Before using this skill, run these two scripts once in a terminal:\r
python setup.py # [SYSTEM PYTHON OK] creates venv, installs deps (~5 min)\r python download_model.py # [SYSTEM PYTHON OK] downloads the model (~2 GB, resumable)\r ```\r Both scripts are in the skill directory alongside this SKILL.md.\r
\r ---\r \r
Agent Routing\r
\r
Always use acoustic_pipeline.py as the entry point, called with VENV_PY (obtained from check_env.py output). It handles all cases:\r
\r
# VENV_PY = value from check_env.py output, e.g. C:\intel_openvino\venv\Scripts\python.exe\r
\r
# Single file\r
& "\x3CVENV_PY>" "\x3Cskill_dir>\acoustic_pipeline.py" --file "\x3CFILE_PATH>" --language auto\r
\r
# Single file + save transcript\r
& "\x3CVENV_PY>" "\x3Cskill_dir>\acoustic_pipeline.py" --file "\x3CFILE_PATH>" --language auto --archive json\r
\r
# Watch folder\r
& "\x3CVENV_PY>" "\x3Cskill_dir>\acoustic_pipeline.py" --watch "\x3CDIR_PATH>" --language auto --archive both\r
\r
# Batch folder\r
& "\x3CVENV_PY>" "\x3Cskill_dir>\acoustic_pipeline.py" --batch "\x3CDIR_PATH>" --language auto --archive json\r
```\r
\r
> **Never run `acoustic_pipeline.py` with system `python`.** It imports model packages (`openvino`, `qwen_asr`) that are only installed in the venv.\r
\r
Use `transcribe.py` directly only when called internally by `acoustic_pipeline.py` — do not invoke it as a standalone entry point.\r
\r
---\r
\r
## Skill Contract (Input / Output)\r
\r
### Accepted Inputs\r
\r
Any agent should treat this skill as a local audio/video transcription skill.\r
\r
1. Single file path\r
* Audio: .wav, .mp3, .flac, .m4a, .ogg, .aac, .wma, .opus\r
* Video: .mp4, .mkv, .webm, .flv, .mov, .avi, .mts, .m2ts, .ts, .m3u8\r
2. Folder path\r
* Watch mode: continuously process new files in folder\r
* Batch mode: process existing files in folder recursively\r
3. Runtime options\r
* language: auto or explicit language name\r
* archive: none | txt | json | both\r
* archive_dir: optional output folder for transcript files\r
* auto_bootstrap: initialize ASR automatically when environment is missing\r
\r
### Output On Success\r
\r
The result should be returned as a JSON object (or equivalent dictionary) with:\r
\r
* text: transcription content\r
* language: detected or requested language\r
* source_file: original input path\r
* source_format: source extension\r
* confidence: optional confidence value (if available)\r
* archive_files: optional object containing txt/json output paths\r
\r
Example shape:\r
\r
```json\r
{\r
"text": "...",\r
"language": "Chinese",\r
"source_file": "C:\\demo\\meeting.mp4",\r
"source_format": ".mp4",\r
"confidence": null,\r
"archive_files": {\r
"json": "C:\\demo\ ranscripts\\meeting_20260326_120000.json"\r
}\r
}\r
```\r
\r
### Output On Failure\r
\r
The agent should return a short structured error summary including:\r
\r
* error: human-readable failure reason\r
* stage: bootstrap | extract_audio | transcribe | archive\r
* source_file: input path (if known)\r
* recoverable: true if retry is reasonable\r
\r
---\r
\r
## ⚠️ Agent instructions\r
\r
1. **Windows / PowerShell only.** Never use Linux commands (`ls`, `rm`, `cat`). Never use `&&` or `call`.\r
2. **Every step reads `state.json` itself** — do not pass paths between steps manually.\r
3. **Use `VENV_PY` from state.json for inference calls** — never use system python for inference. `check_env.py` and `setup.py` are the only scripts intentionally run with system python (they create or validate the venv, so they must not depend on it). `check_env.py` validates the *venv's* packages, not system Python's — `PACKAGES_MISSING` from check_env.py means the venv is incomplete, not the system installation.\r
4. `transcribe.py` is automatically deployed to `ASR_DIR` when `setup.py` runs. If it is missing, re-run `setup.py`.\r
5. **If `transcribe.py` fails at runtime**, do NOT edit it manually. Re-run `setup.py` to redeploy the original from the repository.\r
6. **Never diagnose "model issues" without running Step 1's check script first.** The model layout may use a `thinker/` subdirectory — the scripts handle this automatically.\r
7. **Do not generate helper scripts in agent workspace directories.** Use repository-managed `check_env.py` instead.\r
8. **Goal**: transcribe the audio file and send the result to the conversation.\r
\r
**Auto-recovery policy — try before asking user:**\r
⚠️ Network handling (proxy-aware)\r
\r
When running setup.py or download_model.py:\r
\r
1. Try to read system proxy settings:\r
\r
* Environment variables: HTTP_PROXY / HTTPS_PROXY\r
* Windows WinHTTP proxy (netsh winhttp show proxy)\r
2. If proxy is detected:\r
\r
* Automatically apply it\r
* Continue installation / download\r
3. If no proxy is detected:\r
\r
* Continue normally (direct connection)\r
4. If download/setup fails:\r
\r
* Inform the user whether a proxy was detected\r
* If no proxy was detected, suggest configuring one\r
* Then retry\r
IMPORTANT:\r
* Many networks (corporate / China mainland / campus) require proxy\r
* Download supports resume — safe to retry after fixing network\r
* If `STATE=MISSING`, `VENV_PY=BROKEN`, or `PACKAGES_MISSING`: automatically run `setup.py` (up to 3 attempts). Only ask user to run it manually if all 3 attempts fail. `setup.py` is idempotent — it skips any step already completed (venv, packages, repo clone, asr_engine.py, transcribe.py). Re-running a working environment is safe and fast.\r
* If `MODEL_STATUS=MISSING`: automatically run `download_model.py` (up to 3 attempts). Only ask user to run it manually if all 3 attempts fail or if a single attempt runs longer than 8 minutes without completing (likely a slow connection — download_model.py supports resume so partial progress is not lost). `download_model.py` exits immediately if the model is already complete — re-running the script on an already-downloaded model is safe.\r
* Always announce what you are doing before each attempt: "⚙️ Automatically setting up the environment (attempt N/3)…"\r
\r
\r
## Pre-flight: Verify Runtime Requirements (Required on First Use)\r
\r
> 🔍 Pre-flight: Checking Python and git...\r
\r
### Check Python Version\r
\r
```powershell\r
python --version\r
```\r
\r
**Interpretation:**\r
\r
| Output | Action |\r
|--------|--------|\r
| `Python 3.10.x` or higher | `PYTHON_OK` — set `PYTHON_EXE=python`, continue to git check |\r
| `Python 3.8 / 3.9` | Version too low; upgrade required (see below) |\r
| `'python' is not recognized...` | Python missing; install required (see below) |\r
| `Python was not found; run without arguments...` | **Windows Store alias** — run `py --version` instead (see below) |\r
\r
**If output contains "run without arguments to install from the Microsoft Store"**, the Windows Store App Execution Alias is shadowing the real Python. Do NOT ask the user to change settings, and do NOT write helper scripts. Use `where.exe` (native Windows tool, works in both PowerShell and cmd) to find the real Python:\r
\r
> ⚠️ **Do NOT write any helper .ps1 scripts. Run the command below directly in the terminal.**\r
\r
```powershell\r
where.exe python 2>$null | Where-Object { $_ -notlike "*WindowsApps*" } | Select-Object -First 1\r
```\r
\r
* A path is printed → **record this literal string as `SYSTEM_PYTHON`** (this is the system-level Python, used only to run `setup.py`, `check_env.py`, and `download_model.py`). For every command marked `[SYSTEM PYTHON]`, substitute the full literal path for `python`. Example: `python "\x3Cskill_dir>\check_env.py"` becomes `"C:\Users\intel\AppData\Local\Programs\Python\Python312\python.exe" "\x3Cskill_dir>\check_env.py"`. Do NOT use this path for inference — inference must always use `VENV_PY` (the venv Python path printed by `check_env.py`). Do NOT rely on a `$variable` across tool calls — each call is a new shell process; always embed the literal path directly.\r
* Nothing printed → Python is not installed — install it (see below).\r
**If Python is missing or outdated**, run this one-command silent installer in PowerShell (recommended, no admin required):\r
\r
**```powershell\r
$f = "$env:TEMP\\python-installer.exe"\r
Invoke-WebRequest "https://www.python.org/ftp/python/3.12.10/python-3.12.10-amd64.exe" -OutFile $f\r
Start-Process $f -ArgumentList "/quiet InstallAllUsers=0 PrependPath=1 Include_pip=1" -Wait\r
Remove-Item $f\r
```\r
\r
> `PrependPath=1` adds Python to PATH automatically; `Include_pip=1` installs pip; `InstallAllUsers=0` avoids requiring administrator privileges.\r
\r
After installation, **restart the terminal**, then run `python --version` and confirm it reports `Python 3.12.x`.\r
\r
If you prefer manual installation: download **https://www.python.org/ftp/python/3.12.10/python-3.12.10-amd64.exe** and make sure to check **"Add python.exe to PATH"** during setup.\r
\r
### Check git\r
\r
```bat\r
git --version\r
```\r
\r
**Interpretation:**\r
\r
**| Output | Action |**\r
**|------|------|**\r
| `git version 2.x.x` | ✅ `GIT_OK`, Pre-flight passed |\r
| `'git' is not recognized as an internal or external command` | git is not installed; install is required (see below) |\r
\r
**If git is missing**, run this one-command silent installer in PowerShell:\r
\r
```powershell\r
$f = "$env:TEMP\\git-installer.exe"\r
Invoke-WebRequest "https://github.com/git-for-windows/git/releases/download/v2.49.0.windows.1/Git-2.49.0-64-bit.exe" -OutFile $f\r
Start-Process $f -ArgumentList "/VERYSILENT /NORESTART /NOCANCEL /SP- /CLOSEAPPLICATIONS /RESTARTAPPLICATIONS /COMPONENTS=icons,ext\\reg\\shellhere,assoc,assoc_sh" -Wait\r
Remove-Item $f\r
```\r
\r
After installation, **restart the terminal**, then run `git --version` to confirm.\r
\r
If you prefer manual installation: open **https://git-scm.com/download/win**, download the installer, and proceed with default options.\r
\r
> git is required for the `git+https://` dependency in `requirements_imagegen.txt`; without git, `pip install` will fail with `git: command not found`.\r
\r
**Pre-flight pass criteria**: `python --version` is >= 3.10 and `git --version` returns a valid version string.\r
\r
Status message: `✅ Python and git are ready. Starting main workflow.`\r
\r
**Pipeline — follow exactly in order, no skipping:**\r
\r
```\r
Step 0: parse request → AUDIO_PATH, LANGUAGE, TOPIC\r
Step 1: verify environment → run check_env.py → record VENV_PY and ASR_DIR\r
↳ if STATE=MISSING or VENV_BROKEN or PACKAGES_MISSING: auto-run setup.py (3 attempts)\r
↳ if SCRIPTS_STALE=...: auto-run setup.py to redeploy runtime scripts\r
↳ if MODEL_STATUS=MISSING: auto-run download_model.py (3 attempts)\r
Step 2: transcribe + send → run acoustic_pipeline.py using VENV_PY from Step 1\r
↳ fallback: run transcribe.py directly using VENV_PY and ASR_DIR from Step 1\r
```\r
\r
\---\r
\r
## Step 0: parse request (LLM only — no tools)\r
\r
Extract from the user's message:\r
\r
|Field|Default|Notes|\r
|-|-|-|\r
|`AUDIO_PATH`|required|Absolute path to audio/video file (wav/mp3/flac/m4a/ogg/aac/wma/opus/mp4/mkv/webm/flv/mov/avi/mts/m2ts/ts/m3u8)|\r
|`LANGUAGE`|auto-detect|Optional: `Chinese`, `English`, `Japanese`, etc.|\r
|`TOPIC`|English snake_case from context|Used for output filename|\r
\r
If no audio file provided, ask the user before continuing.\r
\r
---\r
\r
## Step 1: verify environment and model\r
\r
> Step 1/3: checking environment and model...\r
\r
```powershell\r
# [SYSTEM PYTHON] check_env.py creates/validates the venv — must NOT use venv python here\r
python "\x3Cskill_dir>\check_env.py"\r
```\r
\r
> `check_env.py` and `setup.py` intentionally run with system Python — they create and validate the venv, so they cannot depend on it.\r
\r
**On success**: record `VENV_PY` and `ASR_DIR` from output, proceed to Step 2.\r
\r
> `check_env.py` also prints `SCRIPTS_STALE=\x3Cold>->\x3Cnew>` when the deployed `transcribe.py` is outdated.\r
> If this line appears, treat it as a mandatory **auto-update** — run `setup.py` before Step 2 (see below).\r
\r
**On failure — auto-recovery (try before asking user):**\r
\r
### If SCRIPTS_STALE=... → auto-run setup.py to redeploy runtime scripts\r
\r
`SCRIPTS_STALE` means the venv and model are both OK, but the deployed `transcribe.py` (and/or `asr_engine.py`) in `ASR_DIR` is an older version. `setup.py` is idempotent — it skips the venv, package, and model steps and only redeploys the outdated files.\r
\r
```\r
⚙️ Runtime scripts are outdated. Redeploying (attempt 1/3)...\r
```\r
\r
```powershell\r
# [SYSTEM PYTHON] — redeploys transcribe.py and asr_engine.py to ASR_DIR\r
python "\x3Cskill_dir>\setup.py"\r
```\r
After running, re-run `check_env.py` to confirm `SCRIPTS_STALE` no longer appears, then proceed to Step 2.\r
\r
### If STATE=MISSING or VENV_PY=BROKEN or PACKAGES_MISSING → auto-run setup.py\r
\r
`PACKAGES_MISSING` means the venv exists but required packages (e.g. `openvino`, `qwen_asr`) are not installed. Re-running `setup.py` re-installs only what is missing; it does not recreate the venv or re-clone the repo.\r
\r
Announce and run (up to 3 attempts):\r
\r
```\r
⚙️ Environment is not initialized. Running automatic setup (attempt 1/3)...\r
```\r
\r
```powershell\r
# [SYSTEM PYTHON] setup.py creates the venv — must NOT use venv python here\r
python "\x3Cskill_dir>\setup.py"\r
```\r
After each attempt, re-run `check_env.py` to verify. If all 3 attempts fail, show manual fallback below.\r
\r
### If MODEL_STATUS=MISSING → auto-run download_model.py\r
\r
Announce and run (up to 3 attempts, stop if a single attempt exceeds 8 minutes):\r
\r
```\r
📥 Model not found. Starting automatic download (attempt 1/3)...\r
Estimated time: ~3 minutes at 100 Mbps, ~5 minutes at 50 Mbps\r
Download supports resume; rerun safely after interruption.\r
```\r
\r
```powershell\r
# [SYSTEM PYTHON] download_model.py runs before venv — must NOT use venv python here\r
python "\x3Cskill_dir>\download_model.py"\r
```\r
After each attempt, re-run `check_env.py` to verify. If all 3 attempts fail, show manual fallback below.\r
\r
### Manual fallback (only if all 3 auto-attempts fail)\r
\r
Show user this message:\r
\r
```\r
⚠️ Automatic setup failed. Manual steps are required.\r
\r
Open a Windows terminal (PowerShell or Command Prompt) and run the following in order:\r
\r
1) Install environment (if not installed yet):\r
python "\x3Cskill_dir>\setup.py" # [SYSTEM PYTHON OK]\r
Expected duration: about 5 minutes, fully automated.\r
\r
2) Download model (about 2 GB):\r
python "\x3Cskill_dir>\download_model.py" # [SYSTEM PYTHON OK]\r
Download supports resume; rerun safely after interruption.\r
Estimated time: ~3 minutes at 100 Mbps, ~5 minutes at 50 Mbps.\r
\r
After completion, return here and resend your request.\r
```\r
\r
---\r
\r
## Step 2: transcribe and send result\r
\r
> Step 2/2: transcribing...\r
\r
**Preferred path** — run `acoustic_pipeline.py` with the **venv Python** obtained from Step 1:\r
\r
```powershell\r
# [VENV PYTHON] VENV_PY = value printed by check_env.py, e.g. C:\intel_openvino\venv\Scripts\python.exe\r
# Never use system python here — openvino/qwen_asr are only in the venv\r
& "\x3CVENV_PY>" "\x3Cskill_dir>\acoustic_pipeline.py" --file "AUDIO_PATH" --language auto --archive json\r
```\r
\r
If the user specified a language:\r
\r
```powershell\r
# [VENV PYTHON]\r
& "\x3CVENV_PY>" "\x3Cskill_dir>\acoustic_pipeline.py" --file "AUDIO_PATH" --language "LANGUAGE" --archive json\r
```\r
\r
Return the JSON result directly to the conversation and include any `archive_files` paths.\r
\r
**Fallback** — only if `acoustic_pipeline.py` exits with a non-zero code:\r
\r
```powershell\r
# [VENV PYTHON] fallback — both VENV_PY and ASR_DIR come from check_env.py (Step 1)\r
& "\x3CVENV_PY>" "\x3CASR_DIR> ranscribe.py" --audio "AUDIO_PATH" --language "LANGUAGE"\r
```\r
\r
> Never use system python here.\r
\r
**Pass**: `$LASTEXITCODE -eq 0`. Stdout is a single line of JSON with this shape:\r
\r
```json\r
{"text": "...", "language": "Chinese", "time_elapsed": 12.3, "audio_path": "C:\\audio\\file.wav"}\r
```\r
\r
Parse with `$result = $stdout | ConvertFrom-Json`. Record `TRANSCRIPT` from `$result.text` and `LANG` from `$result.language`.\r
\r
**Fail**: if `$LASTEXITCODE -ne 0`, do not attempt to parse stdout. Show the stderr output to the user.\r
\r
Send via `message` tool:\r
\r
```\r
action: "send" message: "✅ LANG\
\
TRANSCRIPT"\r
```\r
\r
---\r
\r
## Troubleshooting\r
\r
|Error|Fix|\r
|-|-|\r
|`Python was not found; run without arguments...`|Windows Store alias blocking `python`. Run: `where.exe python 2>$null | Where-Object { $_ -notlike "*WindowsApps*" } | Select-Object -First 1`. Record the printed literal path and substitute it for `python` in every subsequent command (do NOT use a `$variable` — each tool call is a new shell).|\r
|`STATE=MISSING`|Run `python "\x3Cskill_dir>\setup.py"`|\r
|`VENV_PY=BROKEN`|Re-run `python "\x3Cskill_dir>\setup.py"` — it will rebuild the venv|\r
|`PACKAGES_MISSING: ...`|Re-run `python "\x3Cskill_dir>\setup.py"` — re-installs missing venv packages only; skips steps already done|\r
|`MODEL_STATUS=MISSING`|Run `python "\x3Cskill_dir>\download_model.py"` — exits immediately if model is already complete|\r
|`[ERROR] Audio not found`|Verify the file path is correct and the file exists|\r
|`[ERROR] Model incomplete`|Re-run `python "\x3Cskill_dir>\download_model.py"` — supports resume|\r
|`[ERROR] state.json not found`|Re-run Step 1 (`check_env.py`)|\r
|`SCRIPTS_STALE=v1.0.1->v1.0.2`|Deployed runtime scripts are outdated. Run `python "\x3Cskill_dir>\setup.py"` — it redeploys only the changed files, skipping venv/packages/model (fast).|\r
|`RuntimeError` on GPU|Run `check_env.py` first to confirm model is READY. If model is OK, try adding `--language auto` to remove language mismatch. If still failing, re-run `setup.py` to upgrade OpenVINO and redeploy runtime files.|\r
\r
---\r
\r
## LLM API Usage\r
\r
For agents that prefer a Python import interface over shell commands, run this inside a subprocess using **VENV_PY**:\r
\r
```powershell\r
# Must be run with VENV_PY, not system python\r
& "\x3CVENV_PY>" -c "\r
import sys; sys.path.insert(0, r'\x3Cskill_dir>')\r
from acoustic_pipeline import AcousticPipeline\r
pipeline = AcousticPipeline()\r
result = pipeline.transcribe(r'C:\\meeting.mp4', language='auto', archive_mode='json')\r
print(result['text'])\r
"\r
```\r
\r
Or if your agent framework already runs inside the venv Python process:\r
\r
```python\r
from acoustic_pipeline import AcousticPipeline\r
\r
pipeline = AcousticPipeline()\r
result = pipeline.transcribe("C:\\meeting.mp4", language="auto", archive_mode="json")\r
print(result["text"])\r
print(result.get("archive_files"))\r
```\r
\r
---\r
\r
## Workspace Hygiene\r
\r
* Use repository-managed `check_env.py` — it is versioned, auditable, and repeatable.\r
* If the execution environment forces a temporary file, treat it as disposable and remove it after the command completes.\r
\r
\r
\r
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install local-qwen3-asr-aipc - 安装完成后,直接呼叫该 Skill 的名称或使用
/local-qwen3-asr-aipc触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Local speech to text Qwen3-ASR w/ OpenVINO (no API key) 是什么?
Local offline ASR on Windows — no cloud, no API cost, full privacy. Qwen3-ASR 0.6B + Intel OpenVINO, GPU-accelerated inference. NETWORK: required for first-t... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 202 次。
如何安装 Local speech to text Qwen3-ASR w/ OpenVINO (no API key)?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-qwen3-asr-aipc」即可一键安装,无需额外配置。
Local speech to text Qwen3-ASR w/ OpenVINO (no API key) 是免费的吗?
是的,Local speech to text Qwen3-ASR w/ OpenVINO (no API key) 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Local speech to text Qwen3-ASR w/ OpenVINO (no API key) 支持哪些平台?
Local speech to text Qwen3-ASR w/ OpenVINO (no API key) 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Local speech to text Qwen3-ASR w/ OpenVINO (no API key)?
由 juan-OY(@juan-oy)开发并维护,当前版本 v1.0.3。