← Back to Skills Marketplace
transcription
by
djismgaming
· GitHub ↗
· v1.0.1
· MIT-0
482
Downloads
0
Stars
1
Active Installs
2
Versions
Install in OpenClaw
/install transcription
Description
Transcribe audio and video files using OpenAI Whisper API. Use when user wants to transcribe audio/video files, extract speech from media, or get text from r...
Usage Guidance
This skill will run included Python scripts and invoke ffmpeg, then upload whatever audio/video you provide to the hardcoded endpoint http://192.168.0.11:8080/v1. Before installing or using it: 1) Confirm that 192.168.0.11 is a trusted, intended Whisper service (or modify the scripts to point to a trusted endpoint or to read endpoint from a config/env var). 2) Ensure ffmpeg is installed and the Python 'requests' package is available (the skill metadata does not declare these). 3) Do not send sensitive audio to the skill until you verify the receiving host and its retention/privacy policies. 4) If you need to use an official OpenAI-hosted API, update the endpoint/auth appropriately rather than relying on the hardcoded address. 5) For extra caution, run the scripts on a sandbox machine and test with non-sensitive files first.
Capability Analysis
Type: OpenClaw Skill
Name: transcription
Version: 1.0.1
The transcription skill bundle is designed to transcribe audio and video files using a local OpenAI Whisper API endpoint (192.168.0.11). It utilizes ffmpeg for audio extraction from video files and standard Python libraries (requests, subprocess) for processing. The code logic is transparent, aligns with the stated purpose in SKILL.md, and contains no evidence of malicious intent, data exfiltration, or prompt injection. A minor unreachable code block in scripts/transcribe_audio.py appears to be a harmless copy-paste error.
Capability Assessment
Purpose & Capability
Name/description say 'OpenAI Whisper API' and the code calls a Whisper-compatible endpoint; this is coherent. Minor mismatch: the SKILL.md and scripts point to a hardcoded local endpoint (http://192.168.0.11:8080/v1) rather than the public OpenAI cloud API — this is plausible (self-hosted Whisper) but should be explicit to users.
Instruction Scope
Runtime instructions and scripts will upload user-supplied audio/video files to a fixed HTTP endpoint (192.168.0.11:8080) and call ffmpeg locally. That means any file you provide will be transmitted to that host; ensure that host is trusted and network access is intended. The instructions do not provide an option to override the endpoint via an environment variable or configuration.
Install Mechanism
There is no install spec (instruction-only), which limits disk writes, but the skill ships Python scripts that require external runtime ingredients. The SKILL metadata does not declare required binaries or Python deps even though scripts call ffmpeg and import the requests library.
Credentials
The skill requests no credentials or environment variables (good), but it hardcodes an HTTP endpoint and model name. Because no auth is declared, the endpoint is assumed unauthenticated; ensure this is correct for your environment. Also the skill fails to declare that it requires ffmpeg and Python requests.
Persistence & Privilege
always is false and the skill doesn't request persistent platform privileges or modify other skills. It runs as-invoked and does not declare any elevated host privileges.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install transcription - After installation, invoke the skill by name or use
/transcription - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Added a new script: scripts/transcribe.sh.
- Updated documentation to include manual script usage instructions for direct command-line transcription.
- Removed language selection and auto-detect language features from feature list.
- Simplified usage examples to focus on running scripts via command line.
v1.0.0
Transcription skill initial release:
- Transcribe audio and video files using a local OpenAI Whisper API.
- Supports a wide range of audio (mp3, wav, ogg, etc.) and video (mp4, mov, mkv, etc.) formats.
- Automatic language detection or specify language for transcription.
- Extract timestamps, choose output formats (text, JSON, SRT, VTT).
- Batch processing: send multiple files for simultaneous transcription.
- Automatically extracts audio from video files before transcription.
Metadata
Frequently Asked Questions
What is transcription?
Transcribe audio and video files using OpenAI Whisper API. Use when user wants to transcribe audio/video files, extract speech from media, or get text from r... It is an AI Agent Skill for Claude Code / OpenClaw, with 482 downloads so far.
How do I install transcription?
Run "/install transcription" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is transcription free?
Yes, transcription is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does transcription support?
transcription is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created transcription?
It is built and maintained by djismgaming (@djismgaming); the current version is v1.0.1.
More Skills