← Back to Skills Marketplace
zktufo

Local Speech Recognition

by lllleo · GitHub ↗ · v1.0.2 · MIT-0
linuxdarwinwin32 ✓ Security Clean
352
Downloads
1
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install localspeechrecognition
Description
本地语音转文字 / Local Speech-to-Text. 使用 faster-whisper 在本地运行 Whisper 模型,无需 API 费用,完全免费。收到语音消息(.ogg .m4a .mp3)自动触发转录,支持中文/英文/日语/自动检测。| Free local STT/TTS alternati...
README (SKILL.md)

本地语音识别 / Local Speech Recognition

使用 faster-whisper 在本地运行 Whisper 模型,无需任何 API 费用。免费、离线、保护隐私。 Runs faster-whisper locally — no API keys, no costs, fully offline & private.


功能特点 / Features

  • 🎙️ 全自动转录 — 收到语音消息自动触发,无需手动调用
  • 💰 完全免费 — 无需 API key,无任何费用
  • 🔒 隐私安全 — 所有处理在本地完成,音频不离开你的设备
  • 🌐 多语言支持 — 中文 / 英文 / 日语 / 自动检测
  • 快速响应 — VAD 静音过滤,模型内存缓存
  • 📦 主流格式 — .ogg .m4a .mp3 .wav

使用方式 / Usage

收到语音消息后,OpenClaw 自动调用转录脚本并将结果注入对话。

转录命令 / Command:

python3 ~/.openclaw/workspace/skills/speech-recognition-local/scripts/transcribe.py \x3Caudio_file> [language]

参数说明 / Parameters:

参数 默认值 说明
audio_file 音频文件路径 / Audio file path
language zh 语言:zh / en / ja / auto

模型说明 / Model Info

  • 默认模型 / Default: base(精度与速度平衡)
  • 首次使用自动下载 / Auto-download on first use
  • VAD 静音过滤已启用 / VAD filtering enabled
  • 模型缓存在内存中 / Model cached in memory

适用场景 / Use Cases

场景 / Scenario 说明
语音消息转文字 将微信/飞书/Telegram 语音转为可阅读文本
会议记录 录制音频后快速转录存档
播客字幕 将音频文件批量转为文字稿
隐私敏感场景 不希望音频数据上传第三方

限制 / Limitations

  • 支持格式 / Supported: .ogg .m4a .mp3 .wav
  • 文件大小 / Max size: 25MB

安装前提 / Requirements

  • Python 3.8+
  • faster-whisper(首次使用自动安装)
Usage Guidance
This skill appears to do what it says: local transcription with faster-whisper. Before installing/using it, note: (1) the script assumes the faster-whisper Python package is present — the SKILL.md claim that the package will be auto-installed is not implemented in the script, so you may need to install it yourself (pip install faster-whisper and its dependencies). (2) On first run the Whisper model weights will be downloaded automatically by the library; expect network activity and substantial disk usage (models can be large). (3) The script runs locally and does not exfiltrate credentials, but verify you are comfortable with the package source (faster-whisper) and that you have enough disk, bandwidth, and optional GPU drivers. If you want tighter control, run the script in a virtual environment or container, and inspect/confirm faster-whisper and its upstream download sources (e.g., Hugging Face) before first use.
Capability Analysis
Type: OpenClaw Skill Name: localspeechrecognition Version: 1.0.2 The skill bundle provides a legitimate local speech-to-text utility using the 'faster-whisper' library. The Python script (scripts/transcribe.py) implements standard file checks and model inference logic without any signs of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description, SKILL.md usage, and the included transcribe.py all align: the skill runs faster-whisper locally to transcribe audio files. The supported formats, size checks, and language options are implemented in the script.
Instruction Scope
SKILL.md only instructs the agent to run the included script on incoming audio; the script only reads the provided audio file and prints/returns transcribed text. It does not access other files, env vars, or external endpoints in its code, but model loading (WhisperModel) will cause the runtime/library to download model weights from remote hosts on first use (this is expected for the stated purpose).
Install Mechanism
There is no install spec (instruction-only plus a script). SKILL.md claims '首次使用自动安装 faster-whisper', but transcribe.py does not implement package installation — it directly imports faster_whisper and will raise if missing. Separately, WhisperModel will auto-download model weights on first use; those network downloads are normal for this kind of tool but happen implicitly and may require bandwidth/disk.
Credentials
The skill requests no environment variables, no credentials, and accesses only the audio file path provided as an argument. This is proportionate to local transcription.
Persistence & Privilege
The skill does not request always:true and is user-invocable; it does not modify other skills or system configs. It caches the model in a global variable during the process lifetime (expected behavior).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install localspeechrecognition
  3. After installation, invoke the skill by name or use /localspeechrecognition
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
Optimized description with bilingual SEO content, added use cases and feature highlights
v1.0.1
1.0.1: 修复目录结构 + 模型缓存 + 错误处理 + 多语言支持
v1.0.0
Uses faster-whisper to run Whisper model locally. Development Thought Process 1. Pain Point Previous speech recognition relied on OpenAI API, had costs User had to wait for API calls after sending voice 2. Tech Choice faster-whisper: 2-4x faster than original whisper int8 quantization: Low memory usage, runs on CPU VAD filter: Auto-removes silent segments 3. Model Selection Initially used small (75MB) Optimized to base (also 75MB), 2x faster Memory-friendly, runs smoothly on MacBook
Metadata
Slug localspeechrecognition
Version 1.0.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 3
Frequently Asked Questions

What is Local Speech Recognition?

本地语音转文字 / Local Speech-to-Text. 使用 faster-whisper 在本地运行 Whisper 模型,无需 API 费用,完全免费。收到语音消息(.ogg .m4a .mp3)自动触发转录,支持中文/英文/日语/自动检测。| Free local STT/TTS alternati... It is an AI Agent Skill for Claude Code / OpenClaw, with 352 downloads so far.

How do I install Local Speech Recognition?

Run "/install localspeechrecognition" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Local Speech Recognition free?

Yes, Local Speech Recognition is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Local Speech Recognition support?

Local Speech Recognition is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux, darwin, win32).

Who created Local Speech Recognition?

It is built and maintained by lllleo (@zktufo); the current version is v1.0.2.

💬 Comments