← 返回 Skills 市场

Faster Whisper Gpu

Name: Faster Whisper Gpu
Author: felipeoff

作者 Felipe Oliveira · GitHub ↗ · v0.1.0

cross-platform ⚠ suspicious

565

总下载

当前安装

版本数

在 OpenClaw 中安装

/install faster-whisper-gpu

功能描述

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...

使用说明 (SKILL.md)

🎙️ Faster Whisper GPU

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.

✨ Features

🚀 GPU Accelerated: Uses NVIDIA CUDA for blazing-fast transcription
🔒 100% Local: No data leaves your machine. Complete privacy.
💰 Free Forever: No API costs. Run unlimited transcriptions.
🌍 Multilingual: Supports 99 languages with automatic detection
📁 Multiple Formats: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON
🎯 Multiple Models: From tiny (fast) to large-v3 (most accurate)
🎬 Subtitle Generation: Create SRT files with word-level timestamps

📋 Requirements

Hardware

NVIDIA GPU with CUDA support (recommended: 4GB+ VRAM)
Or CPU-only mode (slower but works on any machine)

Software

Python 3.8+
NVIDIA drivers (for GPU support)
CUDA Toolkit 11.8+ or 12.x

🚀 Quick Start

Installation

# Install dependencies
pip install faster-whisper torch

# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Basic Usage

# Transcribe an audio file (auto-detects GPU)
python transcribe.py audio.mp3

# Specify language explicitly
python transcribe.py audio.mp3 --language pt

# Output as SRT subtitles
python transcribe.py audio.mp3 --format srt --output subtitles.srt

# Use larger model for better accuracy
python transcribe.py audio.mp3 --model large-v3

🔧 Advanced Usage

Command Line Options

python transcribe.py \x3Caudio_file> [options]

Options:
  --model {tiny,base,small,medium,large-v1,large-v2,large-v3}
                        Model size to use (default: base)
  --language LANG       Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified.
  --format {txt,srt,json,vtt}
                        Output format (default: txt)
  --output FILE         Output file path (default: stdout)
  --device {cuda,cpu}   Device to use (default: cuda if available)
  --compute_type {int8,int8_float16,int16,float16,float32}
                        Computation precision (default: float16)
  --task {transcribe,translate}
                        Task: transcribe or translate to English (default: transcribe)
  --vad_filter          Enable voice activity detection filter
  --vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF
                        VAD parameters as comma-separated values
  --condition_on_previous_text
                        Condition on previous text (default: True)
  --initial_prompt PROMPT
                        Initial prompt to guide transcription
  --word_timestamps     Include word-level timestamps (for SRT/JSON)
  --hotwords WORDS      Comma-separated hotwords to boost recognition

Examples

Portuguese Transcription with SRT Output

python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt

English Translation from Any Language

python transcribe.py japanese_audio.mp3 --task translate --format txt

High-Accuracy Mode with Large Model

python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps

CPU-Only Mode (no GPU)

python transcribe.py audio.mp3 --device cpu --compute_type int8

🐍 Python API

from faster_whisper import WhisperModel

# Load model
model = WhisperModel("base", device="cuda", compute_type="float16")

# Transcribe
segments, info = model.transcribe("audio.mp3", language="pt")

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

📊 Model Sizes & VRAM Requirements

Model	Parameters	VRAM Required	Relative Speed	Accuracy
tiny	39 M	~1 GB	~32x	Basic
base	74 M	~1 GB	~16x	Good
small	244 M	~2 GB	~6x	Better
medium	769 M	~5 GB	~2x	Great
large-v3	1550 M	~10 GB	1x	Best

Benchmarks measured on NVIDIA RTX 4090

🔍 Supported Languages

Faster Whisper supports 99 languages including:

Portuguese (pt)
English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Japanese (ja)
Chinese (zh)
Russian (ru)
And 90+ more...

🛠️ Troubleshooting

CUDA Out of Memory

# Use smaller model
python transcribe.py audio.mp3 --model tiny

# Or use CPU
python transcribe.py audio.mp3 --device cpu

# Or reduce precision
python transcribe.py audio.mp3 --compute_type int8

Model Download Issues

Models are automatically downloaded on first use to ~/.cache/huggingface/hub/. If behind a proxy, set:

export HF_HOME=/path/to/custom/cache

Slow Transcription

Ensure GPU is being used: check nvidia-smi during transcription
Use smaller model for faster results
Enable VAD filter to skip silent parts

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request

📜 License

MIT License - See LICENSE for details.

Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.

🙏 Acknowledgments

OpenAI Whisper - Original model
Faster Whisper - Optimized implementation
CTranslate2 - Fast inference engine

Made with ❤️ for the OpenClaw community

安全使用建议

This skill appears to be what it says: a local Faster-Whisper transcription tool. Before installing, consider the following: (1) pip installing torch can be large and may require specific GPU/CUDA builds — prefer installing inside a virtualenv or conda environment. (2) Model weights are downloaded from the Hugging Face hub on first use (network access and several GBs of disk may be required for large models); if you require full offline operation, pre-download and place models in the HF cache or set HF_HOME. (3) Review the GitHub homepage/repo and the transcribe.py file if you need additional assurance about code behavior and licensing. (4) The skill does not request credentials or attempt to send your audio elsewhere, but network activity for model downloads is expected. If those behaviors are acceptable, the skill is coherent and reasonable to use.

功能分析

Type: OpenClaw Skill Name: faster-whisper-gpu Version: 0.1.0 The skill bundle is classified as suspicious due to a potential path traversal vulnerability in `transcribe.py`. The script uses `Path(args.output).write_text()` to save transcription results, where `args.output` is directly taken from user input without sanitization. This allows an attacker to specify an arbitrary file path (e.g., `../../../../tmp/malicious.txt`) to write content outside the intended directory, which is a risky capability without clear malicious intent from the developer. All other files and instructions appear benign and aligned with the stated purpose of local speech-to-text transcription.

能力评估

✓ Purpose & Capability

Name/description, SKILL.md, requirements.txt and transcribe.py all align: the skill implements local Faster-Whisper transcription with optional GPU support. Required binary is only python3; listed Python packages match the stated capability.

ℹ Instruction Scope

SKILL.md and transcribe.py limit actions to local transcription, writing outputs to stdout or local files. The only external activity is the expected download of model weights from the Hugging Face hub (~ ~/.cache/huggingface/hub) on first use; this is documented in SKILL.md. No instructions read unrelated system credentials or post audio to third-party endpoints.

ℹ Install Mechanism

There is no registry install spec; the skill is instruction-only and instructs users to pip install faster-whisper and torch. Using pip is expected but may pull large binary wheels (torch) and GPU-specific builds. Model weights are downloaded at runtime from Hugging Face — an expected but noteworthy network activity and storage use.

✓ Credentials

The skill requests no environment variables or credentials. SKILL.md mentions HF_HOME as an optional way to change the Hugging Face cache directory (documented as a troubleshooting tip) — reasonable and proportional.

✓ Persistence & Privilege

The skill is not always: true and does not request persistent system privileges. It does not modify other skills or agent-wide configs. It writes outputs and caches model files to the user's home cache directory (standard behavior).

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install faster-whisper-gpu
安装完成后，直接呼叫该 Skill 的名称或使用 /faster-whisper-gpu 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.0

Initial release of faster-whisper-gpu. - Local speech-to-text transcription powered by Faster Whisper with NVIDIA GPU acceleration. - Transcribe audio into text, subtitles (SRT/VTT), or JSON with support for 99 languages. - Features multiple model sizes, word-level timestamps, hotword boosting, and voice activity detection. - All data processing is 100% local for maximum privacy. - Includes a command-line interface and Python API for flexible usage.

元数据

Slug faster-whisper-gpu

版本 0.1.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 1

常见问题