turbo-whisper-local-stt

Name: turbo-whisper-local-stt
Author: wangminrui2022

功能描述

当用户想要**音频转文字**、**语音转文本**、**转录录音**、**生成字幕**、**会议录音转文字**、**语音笔记转文本**、**本地转录音频**时自动触发。使用本地 Faster-Whisper（large-v3-ct2 等模型）进行高性能、中文优先的音频转文字，完全离线、隐私安全，支持 wav/mp...

安全使用建议

Before installing/running: - Expect initial network activity: the scripts will pip install packages (including torch) and download models from Hugging Face and PyPI. If you need fully offline operation, pre-download the model and pass --model_path. - Check Python version: the code enforces Python 3.10–3.12 and will exit otherwise. - Disk usage: model files and a virtualenv can be multi-GB; ensure you have space and choose an appropriate output/model path. - Sandbox if possible: because the skill runs pip installs and executes subprocesses, run it in an isolated environment (VM/container) or review/approve the code before giving it access to important systems. - Paths and defaults: note the scripts create a venv in a parent-level venv directory and a default model_path set to a Windows D:/ path — adjust paths for your environment. - No credentials requested: the skill does not ask for API keys or tokens; it uses public Hugging Face downloads. If you intend to use private models, do not supply credentials unless you trust the code. Overall this package appears coherent and appropriate for local STT, but it performs network downloads and installs software automatically — treat those side effects as part of the installation risk and proceed accordingly.

功能分析

Type: OpenClaw Skill Name: turbo-whisper-local-stt Version: 1.0.6 The skill bundle is a legitimate tool for local audio-to-text transcription using the Faster-Whisper library. It contains robust environment management scripts (env_manager.py, ensure_package.py) that automate the setup of a Python virtual environment, detect NVIDIA GPUs via nvidia-smi, and install heavy dependencies like PyTorch and audio-processing libraries. The main logic in transcribe.py handles model downloads from Hugging Face (huggingface.co/wangminrui2022) and performs batch or single-file transcription. While the scripts use high-privilege subprocess calls for package management and environment bootstrapping, these actions are transparently documented and strictly aligned with the stated purpose of providing a high-performance, offline STT service.

能力评估

✓ Purpose & Capability

Name/description (local Faster-Whisper offline STT) matches the code and runtime behavior: scripts create a venv, install required Python packages, download Faster‑Whisper models from Hugging Face, and transcribe audio files. Required binary (python) and file writes (models, logs, outputs) are appropriate for transcription.

ℹ Instruction Scope

SKILL.md and scripts limit actions to audio transcription, path handling, venv creation, dependency installation, model download, and GPU detection (nvidia-smi). This stays within expected scope, but the runtime will: (1) create a virtualenv in a parent-level venv directory, (2) run many pip installs, and (3) download large model files from Hugging Face if not provided locally — all of which are side effects the user should expect. There is no evidence the skill reads unrelated secrets or exfiltrates data.

ℹ Install Mechanism

There is no packaged install spec, but the code performs runtime installation via pip and uses huggingface_hub.snapshot_download to fetch models. Installing torch (and other audio libs) and downloading wheels from download.pytorch.org and PyPI/Tsinghua mirror is expected but involves network activity and large downloads. The installer supports git+/.zip/.whl fallbacks (arbitrary package sources), which is powerful but also increases the potential blast radius if a malicious spec were introduced later.

✓ Credentials

The skill does not request credentials or environment variables. It probes system GPU info (nvidia-smi) and writes logs, model caches, and virtualenv files to disk. These accesses are proportionate to GPU-aware local transcription and model caching behavior.

ℹ Persistence & Privilege

The skill does not request 'always' privilege and is user-invocable. It persists by creating a virtualenv (VENV_DIR) and caching downloaded models and logs under the skill root (or parent venv path). That persistent storage is normal for this use case but can consume significant disk space and may be shared across runs.

版本历史

v1.0.6

No changes detected in this version.

v1.0.5

Version 1.0.5 of turbo-whisper-local-stt - No file changes detected; the skill remains functionally identical to the previous version. - All triggers, model options, and usage guidelines are unchanged. - No updates to commands, parameters, or documentation content.

v1.0.4

No changes detected in this version.

v1.0.3

- Expanded description with clear trigger scenarios and example user phrases to improve discoverability and usability. - Explicitly stated that only audio files or folders are supported; non-audio files will not trigger the skill. - Added detailed trigger instructions and parameter extraction guide for handling user input. - Clarified functionality, supported models, and execution steps for easier reference. - Metadata now includes user-invocable attribute.

v1.0.2

- Updated the default script from scripts/audio_to_text.py to scripts/transcribe.py in command examples. - No other functional or descriptive changes detected.

v1.0.1

- Version bump to 1.0.1 with no code or documentation changes. - No file modifications detected; functionality and usage remain unchanged.

v1.0.0

- Initial release of turbo-whisper-local-stt. - Offline, high-performance audio-to-text transcription using Faster-Whisper large-v3-ct2 with Chinese language priority. - Supports long audio with VAD segmentation, GPU acceleration (int8_float16), and ensures privacy with full local processing. - Suitable for Chinese meeting recordings, voice notes, and subtitle generation. - Outputs structured results (full text, segmented info, language detection). - Compatible with single files or folders; auto-detects environment and GPU, and supports multiple models for varying needs.

元数据

Slug turbo-whisper-local-stt

版本 1.0.6

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 7

常见问题

turbo-whisper-local-stt 是什么？

当用户想要**音频转文字**、**语音转文本**、**转录录音**、**生成字幕**、**会议录音转文字**、**语音笔记转文本**、**本地转录音频**时自动触发。使用本地 Faster-Whisper（large-v3-ct2 等模型）进行高性能、中文优先的音频转文字，完全离线、隐私安全，支持 wav/mp... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 252 次。

如何安装 turbo-whisper-local-stt？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install turbo-whisper-local-stt」即可一键安装，无需额外配置。

turbo-whisper-local-stt 是免费的吗？

是的，turbo-whisper-local-stt 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

turbo-whisper-local-stt 支持哪些平台？

turbo-whisper-local-stt 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 turbo-whisper-local-stt？

由顶尖王牌程序员（@wangminrui2022）开发并维护，当前版本 v1.0.6。

turbo-whisper-local-stt 是什么？

如何安装 turbo-whisper-local-stt？

turbo-whisper-local-stt 是免费的吗？

turbo-whisper-local-stt 支持哪些平台？

谁开发了 turbo-whisper-local-stt？

💬 留言讨论