← 返回 Skills 市场
felipeoff

Faster Whisper Gpu

作者 Felipe Oliveira · GitHub ↗ · v0.1.0
cross-platform ⚠ suspicious
565
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install faster-whisper-gpu
功能描述
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...
使用说明 (SKILL.md)

🎙️ Faster Whisper GPU

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.

✨ Features

  • 🚀 GPU Accelerated: Uses NVIDIA CUDA for blazing-fast transcription
  • 🔒 100% Local: No data leaves your machine. Complete privacy.
  • 💰 Free Forever: No API costs. Run unlimited transcriptions.
  • 🌍 Multilingual: Supports 99 languages with automatic detection
  • 📁 Multiple Formats: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON
  • 🎯 Multiple Models: From tiny (fast) to large-v3 (most accurate)
  • 🎬 Subtitle Generation: Create SRT files with word-level timestamps

📋 Requirements

Hardware

  • NVIDIA GPU with CUDA support (recommended: 4GB+ VRAM)
  • Or CPU-only mode (slower but works on any machine)

Software

  • Python 3.8+
  • NVIDIA drivers (for GPU support)
  • CUDA Toolkit 11.8+ or 12.x

🚀 Quick Start

Installation

# Install dependencies
pip install faster-whisper torch

# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Basic Usage

# Transcribe an audio file (auto-detects GPU)
python transcribe.py audio.mp3

# Specify language explicitly
python transcribe.py audio.mp3 --language pt

# Output as SRT subtitles
python transcribe.py audio.mp3 --format srt --output subtitles.srt

# Use larger model for better accuracy
python transcribe.py audio.mp3 --model large-v3

🔧 Advanced Usage

Command Line Options

python transcribe.py \x3Caudio_file> [options]

Options:
  --model {tiny,base,small,medium,large-v1,large-v2,large-v3}
                        Model size to use (default: base)
  --language LANG       Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified.
  --format {txt,srt,json,vtt}
                        Output format (default: txt)
  --output FILE         Output file path (default: stdout)
  --device {cuda,cpu}   Device to use (default: cuda if available)
  --compute_type {int8,int8_float16,int16,float16,float32}
                        Computation precision (default: float16)
  --task {transcribe,translate}
                        Task: transcribe or translate to English (default: transcribe)
  --vad_filter          Enable voice activity detection filter
  --vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF
                        VAD parameters as comma-separated values
  --condition_on_previous_text
                        Condition on previous text (default: True)
  --initial_prompt PROMPT
                        Initial prompt to guide transcription
  --word_timestamps     Include word-level timestamps (for SRT/JSON)
  --hotwords WORDS      Comma-separated hotwords to boost recognition

Examples

Portuguese Transcription with SRT Output

python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt

English Translation from Any Language

python transcribe.py japanese_audio.mp3 --task translate --format txt

High-Accuracy Mode with Large Model

python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps

CPU-Only Mode (no GPU)

python transcribe.py audio.mp3 --device cpu --compute_type int8

🐍 Python API

from faster_whisper import WhisperModel

# Load model
model = WhisperModel("base", device="cuda", compute_type="float16")

# Transcribe
segments, info = model.transcribe("audio.mp3", language="pt")

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

📊 Model Sizes & VRAM Requirements

Model Parameters VRAM Required Relative Speed Accuracy
tiny 39 M ~1 GB ~32x Basic
base 74 M ~1 GB ~16x Good
small 244 M ~2 GB ~6x Better
medium 769 M ~5 GB ~2x Great
large-v3 1550 M ~10 GB 1x Best

Benchmarks measured on NVIDIA RTX 4090

🔍 Supported Languages

Faster Whisper supports 99 languages including:

  • Portuguese (pt)
  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Japanese (ja)
  • Chinese (zh)
  • Russian (ru)
  • And 90+ more...

🛠️ Troubleshooting

CUDA Out of Memory

# Use smaller model
python transcribe.py audio.mp3 --model tiny

# Or use CPU
python transcribe.py audio.mp3 --device cpu

# Or reduce precision
python transcribe.py audio.mp3 --compute_type int8

Model Download Issues

Models are automatically downloaded on first use to ~/.cache/huggingface/hub/. If behind a proxy, set:

export HF_HOME=/path/to/custom/cache

Slow Transcription

  • Ensure GPU is being used: check nvidia-smi during transcription
  • Use smaller model for faster results
  • Enable VAD filter to skip silent parts

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

📜 License

MIT License - See LICENSE for details.

Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.

🙏 Acknowledgments


Made with ❤️ for the OpenClaw community

安全使用建议
This skill appears to be what it says: a local Faster-Whisper transcription tool. Before installing, consider the following: (1) pip installing torch can be large and may require specific GPU/CUDA builds — prefer installing inside a virtualenv or conda environment. (2) Model weights are downloaded from the Hugging Face hub on first use (network access and several GBs of disk may be required for large models); if you require full offline operation, pre-download and place models in the HF cache or set HF_HOME. (3) Review the GitHub homepage/repo and the transcribe.py file if you need additional assurance about code behavior and licensing. (4) The skill does not request credentials or attempt to send your audio elsewhere, but network activity for model downloads is expected. If those behaviors are acceptable, the skill is coherent and reasonable to use.
功能分析
Type: OpenClaw Skill Name: faster-whisper-gpu Version: 0.1.0 The skill bundle is classified as suspicious due to a potential path traversal vulnerability in `transcribe.py`. The script uses `Path(args.output).write_text()` to save transcription results, where `args.output` is directly taken from user input without sanitization. This allows an attacker to specify an arbitrary file path (e.g., `../../../../tmp/malicious.txt`) to write content outside the intended directory, which is a risky capability without clear malicious intent from the developer. All other files and instructions appear benign and aligned with the stated purpose of local speech-to-text transcription.
能力评估
Purpose & Capability
Name/description, SKILL.md, requirements.txt and transcribe.py all align: the skill implements local Faster-Whisper transcription with optional GPU support. Required binary is only python3; listed Python packages match the stated capability.
Instruction Scope
SKILL.md and transcribe.py limit actions to local transcription, writing outputs to stdout or local files. The only external activity is the expected download of model weights from the Hugging Face hub (~ ~/.cache/huggingface/hub) on first use; this is documented in SKILL.md. No instructions read unrelated system credentials or post audio to third-party endpoints.
Install Mechanism
There is no registry install spec; the skill is instruction-only and instructs users to pip install faster-whisper and torch. Using pip is expected but may pull large binary wheels (torch) and GPU-specific builds. Model weights are downloaded at runtime from Hugging Face — an expected but noteworthy network activity and storage use.
Credentials
The skill requests no environment variables or credentials. SKILL.md mentions HF_HOME as an optional way to change the Hugging Face cache directory (documented as a troubleshooting tip) — reasonable and proportional.
Persistence & Privilege
The skill is not always: true and does not request persistent system privileges. It does not modify other skills or agent-wide configs. It writes outputs and caches model files to the user's home cache directory (standard behavior).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install faster-whisper-gpu
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /faster-whisper-gpu 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial release of faster-whisper-gpu. - Local speech-to-text transcription powered by Faster Whisper with NVIDIA GPU acceleration. - Transcribe audio into text, subtitles (SRT/VTT), or JSON with support for 99 languages. - Features multiple model sizes, word-level timestamps, hotword boosting, and voice activity detection. - All data processing is 100% local for maximum privacy. - Includes a command-line interface and Python API for flexible usage.
元数据
Slug faster-whisper-gpu
版本 0.1.0
许可证
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Faster Whisper Gpu 是什么?

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 565 次。

如何安装 Faster Whisper Gpu?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install faster-whisper-gpu」即可一键安装,无需额外配置。

Faster Whisper Gpu 是免费的吗?

是的,Faster Whisper Gpu 完全免费(开源免费),可自由下载、安装和使用。

Faster Whisper Gpu 支持哪些平台?

Faster Whisper Gpu 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Faster Whisper Gpu?

由 Felipe Oliveira(@felipeoff)开发并维护,当前版本 v0.1.0。

💬 留言讨论