← Back to Skills Marketplace

Faster Whisper

Name: Faster Whisper
Author: theplasmak

by Sarah Mak · GitHub ↗ · v1.5.1

cross-platform ✓ Security Clean

7592

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install faster-whisper

Description

Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...

Usage Guidance

Install only if you are comfortable with a local ML tool that installs Python dependencies and may download models. Use URL/RSS transcription only for media you intend to fetch, avoid pasting Hugging Face tokens into shared logs, choose output paths carefully because files can be overwritten, and avoid opening HTML transcript reports generated from untrusted audio or filenames until the HTML escaping issue is fixed.

Capability Analysis

Type: OpenClaw Skill Name: faster-whisper Version: 1.5.1 This skill is classified as suspicious due to its broad capabilities, which include downloading content from arbitrary URLs via `yt-dlp`, executing `ffmpeg` for audio/video processing and subtitle burning, and performing self-updates of its core dependency. While these actions are plausibly aligned with the stated purpose of a comprehensive transcription tool, they grant significant access to the network and local file system. The `SKILL.md` agent guidance does not contain any malicious prompt injection attempts, and the `setup.sh` and `scripts/transcribe.py` files implement these powerful features using `subprocess.run()` with argument lists, which mitigates direct shell injection vulnerabilities. However, the inherent power of these operations, even when used for legitimate purposes, elevates the risk profile beyond a purely benign classification.

Capability Assessment

ℹ Purpose & Capability

The advertised transcription, subtitles, diarization, URL/RSS download, batch processing, and export features match the included setup script and Python CLI.

ℹ Instruction Scope

The agent guidance generally tells agents to add higher-impact flags only when the user asks, though the trigger list is broad and HTML output is not documented as needing sanitization caution.

ℹ Install Mechanism

Setup creates a local virtual environment and installs Python ML packages, and update flags can upgrade faster-whisper in that environment; this is disclosed and user-invoked, not hidden or automatic.

ℹ Credentials

Network access through yt-dlp/RSS, ffmpeg processing, local file reads/writes, model cache use, and optional Hugging Face token access are proportionate to transcription and diarization.

ℹ Persistence & Privilege

Persistence is limited to the skill virtual environment, model/dependency caches, temporary downloads, and user-specified output files; no background service, privilege escalation, or unrelated persistence was found.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install faster-whisper
After installation, invoke the skill by name or use /faster-whisper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.5.1

- Fixed --skip-existing in multi-format mode to check ALL format outputs before skipping - Fixed --no-timestamps conflict check missing lrc, ass, ttml formats - Fixed --speaker-names silently doing nothing without --diarize; now prints a warning - Batch summary now shows skipped file count when --skip-existing is active

v1.5.0

- docs: update default model from distil-large-v3 to distil-large-v3.5 - fix: setup.sh --check hangfix + skill.json ffmpeg optional - fix(transcribe): clean-filler word list, fuzzy search tokens, URL temp cleanup - fix(multi-format): create output dir in single-file mode - feat: add CSV output, language-map, batch ETA estimate - feat: add TTML output, transcript search, chapter detection, speaker audio export - feat: distil auto-condition, log-level, ffmpeg clarification - fix: rename --without-timestamps to --no-timestamps - feat: add 17 new features — upstream params + LRC/detect-language/merge-sentences/stats/stdin/template

v1.4.5

- Fix author field to match GitHub username (ThePlasmak)

v1.4.4

- Declare yt-dlp and HuggingFace token as optional dependencies in skill.json - Sync SKILL.md frontmatter version and author with skill.json

v1.4.3

- Auto-run wav2vec2 alignment whenever word timestamps are computed - Remove --precise flag (alignment is automatic, flag kept as hidden compat alias) - Alignment triggers for --word-timestamps, --diarize, --min-confidence - No overhead for basic transcription (fast path unchanged)

v1.4.1

- Add --precise flag for wav2vec2 forced alignment (~10ms word accuracy) - Uses torchaudio MMS model (multilingual, cached for batch processing) - Runs before diarization when combined (improves speaker assignment) - Install torchaudio alongside torch in setup.sh

v1.3.0

- Add SRT and VTT subtitle output formats (--format srt/vtt) - Add speaker diarization via pyannote.audio (--diarize) with word-level accuracy - Add URL/YouTube input with auto yt-dlp download - Add batch processing with glob patterns, directories, and --skip-existing - Add initial prompt support for domain terminology (--initial-prompt) - Add confidence-based segment filtering (--min-confidence) - Add performance stats after each transcription (duration, realtime factor) - Unify output under --format flag (text/json/srt/vtt), keep --json for backward compat - Add agent guidance for minimal invocation (don't load unused features)

v1.2.0

- Default model changed to distil-large-v3.5 (lower WER: 7.08 vs 7.53, same speed as v3) - Trained on 4x more data (98k hours) with improved robustness

v1.1.0

- Use BatchedInferencePipeline by default (~3x faster; 69s → 23s on 21-min file with distil-large-v3) - VAD enabled by default in batched mode - Add --batch-size option (default: 8; reduce if OOM) - Add --no-batch flag to fall back to standard WhisperModel - Add --hotwords support for boosting recognition of specific terms - Bump tested version: faster-whisper 1.2.1

v1.0.12

- Fix skill title display on ClawdHub

v1.0.11

- Prefer distil-large-v3 over large-v3-turbo as the recommended model

v1.0.9

- docs: rebrand from Moltbot/MoltHub to OpenClaw/ClawHub

v1.0.7

- Removed Windows-native references from SKILL.md (setup.ps1, transcribe.cmd, winget) since ClawHub cannot distribute .ps1/.cmd files - Windows users should use WSL2 or get Windows scripts from the GitHub repo directly

v1.0.6

- Added .clawdhubignore to exclude README.md, CHANGELOG.md, LICENSE from published package - Fixed requires.bins in skill.json (python3, ffmpeg) - Added platforms field to skill.json - Updated metadata key from moltbot to openclaw in SKILL.md

v1.0.5

Fix metadata: add requires.bins to skill.json, add platforms, update moltbot to openclaw in SKILL.md

v1.0.4

- Fixed skill title and metadata - Removed development files from published package

v1.0.3

Fix skill title (was 'Faster Whisper Clean' due to temp file naming)

v1.0.2

- Improve skill discovery, error handling, some copyediting - Add skill.json - Edit README to reduce confusion as Moltbot may refer to the service

v1.0.1

Remove install metadata (as ClawdHub's install section is confusing); add python3 to required binaries

v1.0.0

Initial public release of faster-whisper. - Local speech-to-text using faster-whisper (CTranslate2 backend), ~4-6x faster than OpenAI Whisper, with identical accuracy. - Supports GPU acceleration for ~20x realtime transcription; automatic hardware detection and setup for Windows. - Offers both standard and distilled models, with selectable accuracy/speed tradeoffs and word-level timestamps. - Cross-platform: Windows (including WSL2), Linux, and macOS (Apple Silicon supported). - Setup scripts provided for all platforms, including automatic installation of dependencies and GPU support where possible. - Includes extensive usage documentation, quick-start commands, model selection guide, and troubleshooting tips.

Metadata

Slug faster-whisper

Version 1.5.1

License —

All-time Installs 254

Active Installs 44

Total Versions 20

Frequently Asked Questions

What is Faster Whisper?

Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT... It is an AI Agent Skill for Claude Code / OpenClaw, with 7592 downloads so far.

How do I install Faster Whisper?

Run "/install faster-whisper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Faster Whisper free?

Yes, Faster Whisper is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Faster Whisper support?

Faster Whisper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Faster Whisper?

It is built and maintained by Sarah Mak (@theplasmak); the current version is v1.5.1.

More Skills