turbo-whisper-local-stt

Name: turbo-whisper-local-stt
Author: wangminrui2022

Description

当用户想要**音频转文字**、**语音转文本**、**转录录音**、**生成字幕**、**会议录音转文字**、**语音笔记转文本**、**本地转录音频**时自动触发。使用本地 Faster-Whisper（large-v3-ct2 等模型）进行高性能、中文优先的音频转文字，完全离线、隐私安全，支持 wav/mp...

Usage Guidance

Before installing/running: - Expect initial network activity: the scripts will pip install packages (including torch) and download models from Hugging Face and PyPI. If you need fully offline operation, pre-download the model and pass --model_path. - Check Python version: the code enforces Python 3.10–3.12 and will exit otherwise. - Disk usage: model files and a virtualenv can be multi-GB; ensure you have space and choose an appropriate output/model path. - Sandbox if possible: because the skill runs pip installs and executes subprocesses, run it in an isolated environment (VM/container) or review/approve the code before giving it access to important systems. - Paths and defaults: note the scripts create a venv in a parent-level venv directory and a default model_path set to a Windows D:/ path — adjust paths for your environment. - No credentials requested: the skill does not ask for API keys or tokens; it uses public Hugging Face downloads. If you intend to use private models, do not supply credentials unless you trust the code. Overall this package appears coherent and appropriate for local STT, but it performs network downloads and installs software automatically — treat those side effects as part of the installation risk and proceed accordingly.

Capability Analysis

Type: OpenClaw Skill Name: turbo-whisper-local-stt Version: 1.0.6 The skill bundle is a legitimate tool for local audio-to-text transcription using the Faster-Whisper library. It contains robust environment management scripts (env_manager.py, ensure_package.py) that automate the setup of a Python virtual environment, detect NVIDIA GPUs via nvidia-smi, and install heavy dependencies like PyTorch and audio-processing libraries. The main logic in transcribe.py handles model downloads from Hugging Face (huggingface.co/wangminrui2022) and performs batch or single-file transcription. While the scripts use high-privilege subprocess calls for package management and environment bootstrapping, these actions are transparently documented and strictly aligned with the stated purpose of providing a high-performance, offline STT service.

Capability Assessment

✓ Purpose & Capability

Name/description (local Faster-Whisper offline STT) matches the code and runtime behavior: scripts create a venv, install required Python packages, download Faster‑Whisper models from Hugging Face, and transcribe audio files. Required binary (python) and file writes (models, logs, outputs) are appropriate for transcription.

ℹ Instruction Scope

SKILL.md and scripts limit actions to audio transcription, path handling, venv creation, dependency installation, model download, and GPU detection (nvidia-smi). This stays within expected scope, but the runtime will: (1) create a virtualenv in a parent-level venv directory, (2) run many pip installs, and (3) download large model files from Hugging Face if not provided locally — all of which are side effects the user should expect. There is no evidence the skill reads unrelated secrets or exfiltrates data.

ℹ Install Mechanism

There is no packaged install spec, but the code performs runtime installation via pip and uses huggingface_hub.snapshot_download to fetch models. Installing torch (and other audio libs) and downloading wheels from download.pytorch.org and PyPI/Tsinghua mirror is expected but involves network activity and large downloads. The installer supports git+/.zip/.whl fallbacks (arbitrary package sources), which is powerful but also increases the potential blast radius if a malicious spec were introduced later.

✓ Credentials

The skill does not request credentials or environment variables. It probes system GPU info (nvidia-smi) and writes logs, model caches, and virtualenv files to disk. These accesses are proportionate to GPU-aware local transcription and model caching behavior.

ℹ Persistence & Privilege

The skill does not request 'always' privilege and is user-invocable. It persists by creating a virtualenv (VENV_DIR) and caching downloaded models and logs under the skill root (or parent venv path). That persistent storage is normal for this use case but can consume significant disk space and may be shared across runs.

Version History

v1.0.6

No changes detected in this version.

v1.0.5

Version 1.0.5 of turbo-whisper-local-stt - No file changes detected; the skill remains functionally identical to the previous version. - All triggers, model options, and usage guidelines are unchanged. - No updates to commands, parameters, or documentation content.

v1.0.4

No changes detected in this version.

v1.0.3

- Expanded description with clear trigger scenarios and example user phrases to improve discoverability and usability. - Explicitly stated that only audio files or folders are supported; non-audio files will not trigger the skill. - Added detailed trigger instructions and parameter extraction guide for handling user input. - Clarified functionality, supported models, and execution steps for easier reference. - Metadata now includes user-invocable attribute.

v1.0.2

- Updated the default script from scripts/audio_to_text.py to scripts/transcribe.py in command examples. - No other functional or descriptive changes detected.

v1.0.1

- Version bump to 1.0.1 with no code or documentation changes. - No file modifications detected; functionality and usage remain unchanged.

v1.0.0

- Initial release of turbo-whisper-local-stt. - Offline, high-performance audio-to-text transcription using Faster-Whisper large-v3-ct2 with Chinese language priority. - Supports long audio with VAD segmentation, GPU acceleration (int8_float16), and ensures privacy with full local processing. - Suitable for Chinese meeting recordings, voice notes, and subtitle generation. - Outputs structured results (full text, segmented info, language detection). - Compatible with single files or folders; auto-detects environment and GPU, and supports multiple models for varying needs.

Metadata

Slug turbo-whisper-local-stt

Version 1.0.6

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 7

Frequently Asked Questions

What is turbo-whisper-local-stt?

当用户想要**音频转文字**、**语音转文本**、**转录录音**、**生成字幕**、**会议录音转文字**、**语音笔记转文本**、**本地转录音频**时自动触发。使用本地 Faster-Whisper（large-v3-ct2 等模型）进行高性能、中文优先的音频转文字，完全离线、隐私安全，支持 wav/mp... It is an AI Agent Skill for Claude Code / OpenClaw, with 252 downloads so far.

How do I install turbo-whisper-local-stt?

Run "/install turbo-whisper-local-stt" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is turbo-whisper-local-stt free?

Yes, turbo-whisper-local-stt is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does turbo-whisper-local-stt support?

turbo-whisper-local-stt is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created turbo-whisper-local-stt?

It is built and maintained by 顶尖王牌程序员 (@wangminrui2022); the current version is v1.0.6.

More Skills

What is turbo-whisper-local-stt?

How do I install turbo-whisper-local-stt?

Is turbo-whisper-local-stt free?

Which platforms does turbo-whisper-local-stt support?

Who created turbo-whisper-local-stt?

💬 Comments