ListenHub Asr

Name: ListenHub Asr
Author: 0xfango

功能描述

Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".

使用说明 (SKILL.md)

When to Use

User wants to transcribe an audio file to text
User provides an audio file path and asks for transcription
User says "转录", "识别", "transcribe", "语音转文字"

When NOT to Use

User wants to synthesize speech from text (use /tts)
User wants to create a podcast or explainer (use /podcast or /explainer)

Purpose

Transcribe audio files to text using coli asr, which runs fully offline via local speech recognition models. No API key required. Supports Chinese, English, Japanese, Korean, and Cantonese (sensevoice model) or English-only (whisper model).

Run coli asr --help for current CLI options and supported flags.

Hard Constraints

No shell scripts. Use direct commands only.
Always read config following shared/config-pattern.md before any interaction
Follow shared/common-patterns.md for interaction patterns
Never ask more than one question at a time

\x3CHARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding. After all parameters are collected, summarize and ask the user to confirm before running any transcription.

\x3C/HARD-GATE>

Interaction Flow

Step 0: Prerequisites Check

Before config setup, silently check the environment:

COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)

Issue	Action
`coli` not found	Block. Tell user to run `npm install -g @marswave/coli` first
`ffmpeg` not found	Warn (WAV files still work). Suggest `brew install ffmpeg` / `sudo apt install ffmpeg`
Models not downloaded	Inform user: first transcription will auto-download models (~60MB) to `~/.coli/models/`

If coli is missing, stop here and do not proceed.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0.

Initial defaults:

# 当前目录:
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"

# 全局:
mkdir -p "$HOME/.listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json"
CONFIG_PATH="$HOME/.listenhub/asr/config.json"

Config summary display:

当前配置 (asr)：
  模型：sensevoice / whisper-tiny.en
  润色：开启 / 关闭

Setup Flow (first run or reconfigure)

Ask in order:

model: "默认使用哪个语音识别模型？"
- "sensevoice（推荐）" — 支持中英日韩粤，可检测语言、情绪、音频事件
- "whisper-tiny.en" — 仅英文
polish: "转录后由 AI 润色文本？（修正标点、去语气词、提升可读性）"
- "是（推荐）" → polish: true
- "否，保留原始转录" → polish: false

Save all answers at once after collecting them.

Step 1: Get Audio File

If the user hasn't provided a file path, ask:

"请提供要转录的音频文件路径。"

Verify the file exists before proceeding.

Step 2: Confirm

准备转录：

  文件：{filename}
  模型：{model}
  润色：{是 / 否}

继续？

Step 3: Transcribe

Run coli asr with JSON output (to get metadata):

coli asr -j --model {model} "{file}"

On first run, coli will automatically download the required model. This may take a moment — inform the user if models haven't been downloaded yet.

Parse the JSON result to extract text, lang, emotion, event, duration.

Step 4: Polish (if enabled)

If polish is true, take the raw text from the transcription result and rewrite it to fix punctuation, remove filler words, and improve readability. Preserve the original meaning and speaker intent. Do not summarize or paraphrase.

Step 5: Present Result

Display the transcript directly in the conversation:

转录完成

{transcript text}

─────────────────
语言：{lang} · 情绪：{emotion} · 时长：{duration}s

If polished, show the polished version with a note that it was AI-refined. Offer to show the raw original on request.

Step 6: Export as Markdown (optional)

After presenting the result, ask:

Question: "保存为 Markdown 文件到当前目录？"
Options:
  - "是" — save to current directory
  - "否" — done

If yes, write {audio-filename}-transcript.md to the current working directory (where the user is running Claude Code). The file should contain the transcript text (polished version if polish was enabled), with a front-matter header:

---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}

Composability

Invoked by: future skills that need to transcribe recorded audio
Invokes: nothing

Examples

"帮我转录这个文件 meeting.m4a"

Check prerequisites
Read config
Confirm: meeting.m4a, sensevoice, polish on
Run coli asr -j --model sensevoice "meeting.m4a"
Polish the raw text
Display inline

"transcribe interview.wav, no polish"

Check prerequisites
Read config
Override polish to false for this session
Run coli asr -j --model sensevoice "interview.wav"
Display raw transcript inline

安全使用建议

This skill appears to do what it says: local transcription via the coli CLI. Before installing/using it, be aware that: - It will create small config files in the current directory and in $HOME (~/.listenhub/asr). - The coli CLI may auto-download speech models (~60MB) into ~/.coli/models; this involves network download and disk usage. - If coli is missing the skill will recommend `npm install -g @marswave/coli` — review that npm package and its source before installing. - Transcripts can be saved to your current working directory; ensure you are comfortable with files being written there. - The skill does not request secrets or external credentials. If you want to avoid downloads or file writes, do not run the transcription or run it in a controlled environment.

功能分析

Type: OpenClaw Skill Name: marswave-asr Version: 0.1.0 The skill bundle provides a legitimate audio transcription service using the 'coli' CLI tool and local models (SenseVoice/Whisper). It features robust environment checks in SKILL.md for dependencies like ffmpeg and sherpa-based models, manages configuration in ~/.listenhub, and includes AI-driven text polishing. The behavior is strictly aligned with its stated purpose, and no malicious indicators, data exfiltration attempts, or suspicious shell execution patterns were found.

能力评估

✓ Purpose & Capability

The skill's purpose (local ASR via the coli CLI) matches its instructions: it checks for coli and ffmpeg, runs `coli asr`, and parses JSON output. Nothing in the metadata or SKILL.md requests unrelated services or credentials.

ℹ Instruction Scope

Instructions perform local environment checks (which/which ffmpeg), read/write small config files in the current directory and $HOME, run `coli asr` which may auto-download models, and may write transcript Markdown files to the current working directory. These are within scope but are persistent file operations and involve network downloads initiated by the coli tool.

✓ Install Mechanism

This is an instruction-only skill with no install spec. The SKILL.md suggests installing `@marswave/coli` via npm if missing, but the skill itself does not fetch or execute remote archives. Risk from installs is therefore limited to user-initiated npm/brew/apt commands.

✓ Credentials

The skill declares no required environment variables or credentials. It only references local paths (config dirs and ~/.coli/models) appropriate to running a local ASR CLI. No unrelated secrets are requested.

ℹ Persistence & Privilege

The skill writes config to .listenhub/asr in the current directory and $HOME/.listenhub/asr, and `coli` may persist models under ~/.coli/models (~60MB). always:false so it is not force-enabled, but it does create files and download models when run.

版本历史

v0.1.0

Initial release of the Marswave ASR skill for local audio transcription. - Transcribes audio files to text using local speech recognition (no API needed) - Supports Chinese, English, Japanese, Korean, and Cantonese (sensevoice) or English-only (whisper) models - Interactive, step-by-step workflow with confirmation and config, following strict user interaction gates - Optional AI-powered transcript polishing for readability - Markdown export option for transcripts with metadata - Checks for prerequisites (`coli`, `ffmpeg`, and speech models) before proceeding

元数据

Slug marswave-asr

版本 0.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

ListenHub Asr 是什么？

Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字". 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 245 次。

如何安装 ListenHub Asr？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install marswave-asr」即可一键安装，无需额外配置。

ListenHub Asr 是免费的吗？

是的，ListenHub Asr 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

ListenHub Asr 支持哪些平台？

ListenHub Asr 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 ListenHub Asr？

由 0xFango（@0xfango）开发并维护，当前版本 v0.1.0。