← Back to Skills Marketplace
felipeoff

Faster Whisper Gpu

by Felipe Oliveira · GitHub ↗ · v0.1.0
cross-platform ⚠ suspicious
565
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install faster-whisper-gpu
Description
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...
README (SKILL.md)

🎙️ Faster Whisper GPU

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.

✨ Features

  • 🚀 GPU Accelerated: Uses NVIDIA CUDA for blazing-fast transcription
  • 🔒 100% Local: No data leaves your machine. Complete privacy.
  • 💰 Free Forever: No API costs. Run unlimited transcriptions.
  • 🌍 Multilingual: Supports 99 languages with automatic detection
  • 📁 Multiple Formats: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON
  • 🎯 Multiple Models: From tiny (fast) to large-v3 (most accurate)
  • 🎬 Subtitle Generation: Create SRT files with word-level timestamps

📋 Requirements

Hardware

  • NVIDIA GPU with CUDA support (recommended: 4GB+ VRAM)
  • Or CPU-only mode (slower but works on any machine)

Software

  • Python 3.8+
  • NVIDIA drivers (for GPU support)
  • CUDA Toolkit 11.8+ or 12.x

🚀 Quick Start

Installation

# Install dependencies
pip install faster-whisper torch

# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Basic Usage

# Transcribe an audio file (auto-detects GPU)
python transcribe.py audio.mp3

# Specify language explicitly
python transcribe.py audio.mp3 --language pt

# Output as SRT subtitles
python transcribe.py audio.mp3 --format srt --output subtitles.srt

# Use larger model for better accuracy
python transcribe.py audio.mp3 --model large-v3

🔧 Advanced Usage

Command Line Options

python transcribe.py \x3Caudio_file> [options]

Options:
  --model {tiny,base,small,medium,large-v1,large-v2,large-v3}
                        Model size to use (default: base)
  --language LANG       Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified.
  --format {txt,srt,json,vtt}
                        Output format (default: txt)
  --output FILE         Output file path (default: stdout)
  --device {cuda,cpu}   Device to use (default: cuda if available)
  --compute_type {int8,int8_float16,int16,float16,float32}
                        Computation precision (default: float16)
  --task {transcribe,translate}
                        Task: transcribe or translate to English (default: transcribe)
  --vad_filter          Enable voice activity detection filter
  --vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF
                        VAD parameters as comma-separated values
  --condition_on_previous_text
                        Condition on previous text (default: True)
  --initial_prompt PROMPT
                        Initial prompt to guide transcription
  --word_timestamps     Include word-level timestamps (for SRT/JSON)
  --hotwords WORDS      Comma-separated hotwords to boost recognition

Examples

Portuguese Transcription with SRT Output

python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt

English Translation from Any Language

python transcribe.py japanese_audio.mp3 --task translate --format txt

High-Accuracy Mode with Large Model

python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps

CPU-Only Mode (no GPU)

python transcribe.py audio.mp3 --device cpu --compute_type int8

🐍 Python API

from faster_whisper import WhisperModel

# Load model
model = WhisperModel("base", device="cuda", compute_type="float16")

# Transcribe
segments, info = model.transcribe("audio.mp3", language="pt")

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

📊 Model Sizes & VRAM Requirements

Model Parameters VRAM Required Relative Speed Accuracy
tiny 39 M ~1 GB ~32x Basic
base 74 M ~1 GB ~16x Good
small 244 M ~2 GB ~6x Better
medium 769 M ~5 GB ~2x Great
large-v3 1550 M ~10 GB 1x Best

Benchmarks measured on NVIDIA RTX 4090

🔍 Supported Languages

Faster Whisper supports 99 languages including:

  • Portuguese (pt)
  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Japanese (ja)
  • Chinese (zh)
  • Russian (ru)
  • And 90+ more...

🛠️ Troubleshooting

CUDA Out of Memory

# Use smaller model
python transcribe.py audio.mp3 --model tiny

# Or use CPU
python transcribe.py audio.mp3 --device cpu

# Or reduce precision
python transcribe.py audio.mp3 --compute_type int8

Model Download Issues

Models are automatically downloaded on first use to ~/.cache/huggingface/hub/. If behind a proxy, set:

export HF_HOME=/path/to/custom/cache

Slow Transcription

  • Ensure GPU is being used: check nvidia-smi during transcription
  • Use smaller model for faster results
  • Enable VAD filter to skip silent parts

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

📜 License

MIT License - See LICENSE for details.

Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.

🙏 Acknowledgments


Made with ❤️ for the OpenClaw community

Usage Guidance
This skill appears to be what it says: a local Faster-Whisper transcription tool. Before installing, consider the following: (1) pip installing torch can be large and may require specific GPU/CUDA builds — prefer installing inside a virtualenv or conda environment. (2) Model weights are downloaded from the Hugging Face hub on first use (network access and several GBs of disk may be required for large models); if you require full offline operation, pre-download and place models in the HF cache or set HF_HOME. (3) Review the GitHub homepage/repo and the transcribe.py file if you need additional assurance about code behavior and licensing. (4) The skill does not request credentials or attempt to send your audio elsewhere, but network activity for model downloads is expected. If those behaviors are acceptable, the skill is coherent and reasonable to use.
Capability Analysis
Type: OpenClaw Skill Name: faster-whisper-gpu Version: 0.1.0 The skill bundle is classified as suspicious due to a potential path traversal vulnerability in `transcribe.py`. The script uses `Path(args.output).write_text()` to save transcription results, where `args.output` is directly taken from user input without sanitization. This allows an attacker to specify an arbitrary file path (e.g., `../../../../tmp/malicious.txt`) to write content outside the intended directory, which is a risky capability without clear malicious intent from the developer. All other files and instructions appear benign and aligned with the stated purpose of local speech-to-text transcription.
Capability Assessment
Purpose & Capability
Name/description, SKILL.md, requirements.txt and transcribe.py all align: the skill implements local Faster-Whisper transcription with optional GPU support. Required binary is only python3; listed Python packages match the stated capability.
Instruction Scope
SKILL.md and transcribe.py limit actions to local transcription, writing outputs to stdout or local files. The only external activity is the expected download of model weights from the Hugging Face hub (~ ~/.cache/huggingface/hub) on first use; this is documented in SKILL.md. No instructions read unrelated system credentials or post audio to third-party endpoints.
Install Mechanism
There is no registry install spec; the skill is instruction-only and instructs users to pip install faster-whisper and torch. Using pip is expected but may pull large binary wheels (torch) and GPU-specific builds. Model weights are downloaded at runtime from Hugging Face — an expected but noteworthy network activity and storage use.
Credentials
The skill requests no environment variables or credentials. SKILL.md mentions HF_HOME as an optional way to change the Hugging Face cache directory (documented as a troubleshooting tip) — reasonable and proportional.
Persistence & Privilege
The skill is not always: true and does not request persistent system privileges. It does not modify other skills or agent-wide configs. It writes outputs and caches model files to the user's home cache directory (standard behavior).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install faster-whisper-gpu
  3. After installation, invoke the skill by name or use /faster-whisper-gpu
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Initial release of faster-whisper-gpu. - Local speech-to-text transcription powered by Faster Whisper with NVIDIA GPU acceleration. - Transcribe audio into text, subtitles (SRT/VTT), or JSON with support for 99 languages. - Features multiple model sizes, word-level timestamps, hotword boosting, and voice activity detection. - All data processing is 100% local for maximum privacy. - Includes a command-line interface and Python API for flexible usage.
Metadata
Slug faster-whisper-gpu
Version 0.1.0
License
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Faster Whisper Gpu?

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to... It is an AI Agent Skill for Claude Code / OpenClaw, with 565 downloads so far.

How do I install Faster Whisper Gpu?

Run "/install faster-whisper-gpu" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Faster Whisper Gpu free?

Yes, Faster Whisper Gpu is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Faster Whisper Gpu support?

Faster Whisper Gpu is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Faster Whisper Gpu?

It is built and maintained by Felipe Oliveira (@felipeoff); the current version is v0.1.0.

💬 Comments