← Back to Skills Marketplace
theplasmak

Super-Transcribe — Unified Speech-to-Text

by Sarah Mak · GitHub ↗ · v1.0.2
cross-platform ✓ Security Clean
551
Downloads
0
Stars
1
Active Installs
3
Versions
Install in OpenClaw
/install super-transcribe
Description
Unified speech-to-text skill. Use when the user asks to transcribe audio or video, generate subtitles, identify speakers, translate speech, search transcript...
Usage Guidance
This skill appears coherent with its stated purpose, but note the following before installing or running: (1) It will create per-backend virtual environments and perform pip installs and model downloads on first use — expect large (GB-scale) downloads for Parakeet/NeMo and PyTorch with CUDA. (2) The shared helper can run pip using the current Python interpreter; to avoid installing packages system-wide, run the tool inside an isolated environment (container, dedicated user, or a directory where venv creation is allowed) or review and run the setup scripts manually. (3) Speaker diarization may require a HuggingFace token at ~/.cache/huggingface/token; the skill will check for that file but does not require it unless you request diarization. (4) The tool uses ffmpeg and yt-dlp for media conversion and URL downloads — only install those if you need URL input or non-wav formats. Recommended steps: run ./scripts/transcribe --check (or the provided setup.sh --check) to preview what would be installed, use the lean-install option if bandwidth is limited, and inspect the setup.sh and auto-install helper if you prefer to perform installations manually or inside a controlled environment.
Capability Analysis
Type: OpenClaw Skill Name: super-transcribe Version: 1.0.2 The OpenClaw AgentSkills skill bundle 'super-transcribe' is classified as benign. All code and documentation align with its stated purpose of providing unified speech-to-text transcription. The `SKILL.md` provides clear, responsible instructions for the AI agent, emphasizing cautious handling of user requests and avoiding unnecessary actions. Python scripts (`transcribe.py` in both backends) and setup scripts (`setup.sh`) utilize standard tools like `ffmpeg`, `yt-dlp`, `pip`, and `uv` for audio processing, URL downloads, and dependency management. While these tools involve system command execution and network access, they are used for legitimate functions, and arguments are passed in a way that mitigates direct shell injection risks. The mention of `~/.cache/huggingface/token` is for legitimate authentication with HuggingFace models (e.g., for diarization) and does not indicate credential theft. No evidence of data exfiltration, persistence mechanisms, obfuscation, or other malicious intent was found.
Capability Assessment
Purpose & Capability
Name/description (unified speech-to-text) align with contents: two bundled backends (faster-whisper and Parakeet/NeMo), CLI entrypoints, and many audio processing utilities. Requested binaries (python3, optional ffmpeg/yt-dlp) are appropriate for the described features.
Instruction Scope
SKILL.md and included scripts instruct the agent/user to run the bundled CLI and setup scripts which: (a) probe system state (GPU, ffmpeg, huggingface token), (b) create per-backend virtualenvs and install packages, (c) run ffmpeg/yt-dlp for conversion / URL downloads, and (d) download ML model weights on first use. These actions are within transcription scope, but the shared lib contains an auto_install_package helper that will run pip (or uv) using the current Python interpreter — if the CLI is invoked outside the created venv, that could install into the system environment. Consider running the skill in an isolated environment if you want to avoid system-wide pip operations.
Install Mechanism
There is no registry install spec (instruction-only), but the skill includes setup.sh scripts that will pip-install dependencies and trigger large downloads (PyTorch with CUDA, CTranslate2, NeMo, model weights via HuggingFace/pip). These are expected for this functionality; the downloads are from package managers/HuggingFace flows rather than an arbitrary short URL. Expect multi-gigabyte downloads for the Parakeet backend and model weights.
Credentials
The skill declares no required environment secrets. It optionally reads $HOME/.cache/huggingface/token when diarization is used (documented). No unrelated credentials or config paths are required. Network access is required for pip/yt-dlp/model downloads — appropriate for the purpose but worth noting.
Persistence & Privilege
always:false (not force-included). The skill writes venvs and caches under its scripts/backends directories and will cache downloaded models; it does not claim to modify other skills or global agent config. Autonomous invocation is allowed by default, which is normal; combine this with the lazy-install behavior if you want to control when downloads happen.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install super-transcribe
  3. After installation, invoke the skill by name or use /super-transcribe
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
- Fixed setup.sh missing from published skill package (was excluded by .clawdhubignore) - Fixed broken ./setup.sh --update references in SKILL.md to point to correct backend-specific scripts
v1.0.1
- Fixed skill display name
v1.0.0
- Unified speech-to-text skill with automatic backend routing (faster-whisper and parakeet) - Smart quickstart wizard and health checks for backend setup - 12 output formats including SRT, VTT, ASS, TTML, and agent-friendly JSON - Speaker diarization, filler-word removal, paragraph splitting, and chapter detection - RSS/podcast feed transcription support - Lazy dependency installation — backends only install what they need on first use - Comprehensive test suite covering all shared library modules
Metadata
Slug super-transcribe
Version 1.0.2
License
All-time Installs 1
Active Installs 1
Total Versions 3
Frequently Asked Questions

What is Super-Transcribe — Unified Speech-to-Text?

Unified speech-to-text skill. Use when the user asks to transcribe audio or video, generate subtitles, identify speakers, translate speech, search transcript... It is an AI Agent Skill for Claude Code / OpenClaw, with 551 downloads so far.

How do I install Super-Transcribe — Unified Speech-to-Text?

Run "/install super-transcribe" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Super-Transcribe — Unified Speech-to-Text free?

Yes, Super-Transcribe — Unified Speech-to-Text is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Super-Transcribe — Unified Speech-to-Text support?

Super-Transcribe — Unified Speech-to-Text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Super-Transcribe — Unified Speech-to-Text?

It is built and maintained by Sarah Mak (@theplasmak); the current version is v1.0.2.

💬 Comments