← Back to Skills Marketplace

Local Speech Recognition

Name: Local Speech Recognition
Author: zktufo

by lllleo · GitHub ↗ · v1.0.2 · MIT-0

linuxdarwinwin32 ✓ Security Clean

352

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install localspeechrecognition

Description

本地语音转文字 / Local Speech-to-Text. 使用 faster-whisper 在本地运行 Whisper 模型，无需 API 费用，完全免费。收到语音消息(.ogg .m4a .mp3)自动触发转录，支持中文/英文/日语/自动检测。| Free local STT/TTS alternati...

README (SKILL.md)

本地语音识别 / Local Speech Recognition

使用 faster-whisper 在本地运行 Whisper 模型，无需任何 API 费用。免费、离线、保护隐私。 Runs faster-whisper locally — no API keys, no costs, fully offline & private.

功能特点 / Features

🎙️ 全自动转录 — 收到语音消息自动触发，无需手动调用
💰 完全免费 — 无需 API key，无任何费用
🔒 隐私安全 — 所有处理在本地完成，音频不离开你的设备
🌐 多语言支持 — 中文 / 英文 / 日语 / 自动检测
⚡ 快速响应 — VAD 静音过滤，模型内存缓存
📦 主流格式 — .ogg .m4a .mp3 .wav

使用方式 / Usage

收到语音消息后，OpenClaw 自动调用转录脚本并将结果注入对话。

转录命令 / Command:

python3 ~/.openclaw/workspace/skills/speech-recognition-local/scripts/transcribe.py \x3Caudio_file> [language]

参数说明 / Parameters:

参数	默认值	说明
`audio_file`	—	音频文件路径 / Audio file path
`language`	`zh`	语言：zh / en / ja / auto

模型说明 / Model Info

默认模型 / Default: base（精度与速度平衡）
首次使用自动下载 / Auto-download on first use
VAD 静音过滤已启用 / VAD filtering enabled
模型缓存在内存中 / Model cached in memory

适用场景 / Use Cases

场景 / Scenario	说明
语音消息转文字	将微信/飞书/Telegram 语音转为可阅读文本
会议记录	录制音频后快速转录存档
播客字幕	将音频文件批量转为文字稿
隐私敏感场景	不希望音频数据上传第三方

限制 / Limitations

支持格式 / Supported: .ogg .m4a .mp3 .wav
文件大小 / Max size: 25MB

安装前提 / Requirements

Python 3.8+
faster-whisper（首次使用自动安装）

Usage Guidance

This skill appears to do what it says: local transcription with faster-whisper. Before installing/using it, note: (1) the script assumes the faster-whisper Python package is present — the SKILL.md claim that the package will be auto-installed is not implemented in the script, so you may need to install it yourself (pip install faster-whisper and its dependencies). (2) On first run the Whisper model weights will be downloaded automatically by the library; expect network activity and substantial disk usage (models can be large). (3) The script runs locally and does not exfiltrate credentials, but verify you are comfortable with the package source (faster-whisper) and that you have enough disk, bandwidth, and optional GPU drivers. If you want tighter control, run the script in a virtual environment or container, and inspect/confirm faster-whisper and its upstream download sources (e.g., Hugging Face) before first use.

Capability Analysis

Type: OpenClaw Skill Name: localspeechrecognition Version: 1.0.2 The skill bundle provides a legitimate local speech-to-text utility using the 'faster-whisper' library. The Python script (scripts/transcribe.py) implements standard file checks and model inference logic without any signs of data exfiltration, malicious execution, or prompt injection.

Capability Assessment

✓ Purpose & Capability

Name/description, SKILL.md usage, and the included transcribe.py all align: the skill runs faster-whisper locally to transcribe audio files. The supported formats, size checks, and language options are implemented in the script.

ℹ Instruction Scope

SKILL.md only instructs the agent to run the included script on incoming audio; the script only reads the provided audio file and prints/returns transcribed text. It does not access other files, env vars, or external endpoints in its code, but model loading (WhisperModel) will cause the runtime/library to download model weights from remote hosts on first use (this is expected for the stated purpose).

ℹ Install Mechanism

There is no install spec (instruction-only plus a script). SKILL.md claims '首次使用自动安装 faster-whisper', but transcribe.py does not implement package installation — it directly imports faster_whisper and will raise if missing. Separately, WhisperModel will auto-download model weights on first use; those network downloads are normal for this kind of tool but happen implicitly and may require bandwidth/disk.

✓ Credentials

The skill requests no environment variables, no credentials, and accesses only the audio file path provided as an argument. This is proportionate to local transcription.

✓ Persistence & Privilege

The skill does not request always:true and is user-invocable; it does not modify other skills or system configs. It caches the model in a global variable during the process lifetime (expected behavior).

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install localspeechrecognition
After installation, invoke the skill by name or use /localspeechrecognition
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.2

Optimized description with bilingual SEO content, added use cases and feature highlights

v1.0.1

1.0.1: 修复目录结构 + 模型缓存 + 错误处理 + 多语言支持

v1.0.0

Uses faster-whisper to run Whisper model locally. Development Thought Process 1. Pain Point Previous speech recognition relied on OpenAI API, had costs User had to wait for API calls after sending voice 2. Tech Choice faster-whisper: 2-4x faster than original whisper int8 quantization: Low memory usage, runs on CPU VAD filter: Auto-removes silent segments 3. Model Selection Initially used small (75MB) Optimized to base (also 75MB), 2x faster Memory-friendly, runs smoothly on MacBook

Metadata

Slug localspeechrecognition

Version 1.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Local Speech Recognition?

本地语音转文字 / Local Speech-to-Text. 使用 faster-whisper 在本地运行 Whisper 模型，无需 API 费用，完全免费。收到语音消息(.ogg .m4a .mp3)自动触发转录，支持中文/英文/日语/自动检测。| Free local STT/TTS alternati... It is an AI Agent Skill for Claude Code / OpenClaw, with 352 downloads so far.

How do I install Local Speech Recognition?

Run "/install localspeechrecognition" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Local Speech Recognition free?

Yes, Local Speech Recognition is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Local Speech Recognition support?

Local Speech Recognition is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux, darwin, win32).

Who created Local Speech Recognition?

It is built and maintained by lllleo (@zktufo); the current version is v1.0.2.

More Skills