← Back to Skills Marketplace

Speech to Text Transcription

Name: Speech to Text Transcription
Author: ivangdavila

by Iván · GitHub ↗ · v1.0.0

linuxdarwinwin32 ✓ Security Clean

806

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install speech-to-text-transcription

Description

Transcribe audio and video files to text with speaker detection, timestamps, and format conversion.

Usage Guidance

This skill appears coherent for transcription, but consider these before installing: (1) It will create ~/speech-to-text-transcription/, store transcripts there, and keep a memory.md with preferences—delete or encrypt that directory if you are transcribing sensitive audio. (2) Cloud transcription (OpenAI, AssemblyAI, Deepgram) will upload audio to third parties only if you choose those providers and set API keys; for sensitive material prefer local Whisper. (3) The skill may download URLs and write temp files during processing—ensure you trust source URLs. (4) The skill is instruction-only (no install script), but it suggests optionally running pip install openai-whisper to enable local transcription. (5) Confirm the agent prompts you before uploading any audio and review/clean the stored transcripts and memory if privacy is a concern.

Capability Analysis

Type: OpenClaw Skill Name: speech-to-text-transcription Version: 1.0.0 The OpenClaw AgentSkills bundle for 'speech-to-text-transcription' is benign. All instructions and code snippets, including the use of `ffmpeg`, `curl` for cloud APIs (OpenAI, AssemblyAI), and `pip install openai-whisper`, are directly aligned with the stated purpose of transcribing audio and video files. The `SKILL.md` file provides clear security and privacy disclosures, explicitly stating what data leaves the machine and what stays local, and disclaims malicious behaviors like storing API keys in plain text or auto-uploading without confirmation. There is no evidence of prompt injection for malicious purposes, data exfiltration, persistence, or unauthorized remote control.

Capability Assessment

✓ Purpose & Capability

Name/description match the behavior: audio/video processing with ffmpeg, local Whisper or cloud providers (OpenAI, AssemblyAI, Deepgram). No unrelated binaries, env vars, or endpoints are requested.

ℹ Instruction Scope

Instructions cover verifying local files, downloading URLs to a temp folder, splitting/processing with ffmpeg, calling cloud transcription APIs only when chosen, and saving transcripts. A potential privacy note: the skill explicitly stores transcripts and a 'memory' file (provider preferences, usage patterns) under ~/speech-to-text-transcription/, and the guidance to 'learn from what they transcribe' could lead to persistent storage of sensitive metadata unless the user is explicit; otherwise the runtime instructions stay within the stated purpose.

✓ Install Mechanism

This is instruction-only: there is no install script or arbitrary download. The SKILL.md suggests installing local Python Whisper via pip if desired, which is proportional and optional. No remote archives or opaque installs are invoked by the skill itself.

✓ Credentials

No required environment variables. Optional API keys (OPENAI_API_KEY, ASSEMBLYAI_API_KEY, DEEPGRAM_API_KEY) are appropriate and proportional to enabling cloud providers; they are optional and documented.

ℹ Persistence & Privilege

The skill writes to its own directory in the user's home and persists transcripts and a memory.md file. It does not request system-wide privileges, nor is always:true set. Users should be aware of on-disk persistence of possibly sensitive content.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install speech-to-text-transcription
After installation, invoke the skill by name or use /speech-to-text-transcription
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release with multi-provider support and batch processing.

Metadata

Slug speech-to-text-transcription

Version 1.0.0

License —

All-time Installs 3

Active Installs 3

Total Versions 1

Frequently Asked Questions

What is Speech to Text Transcription?

Transcribe audio and video files to text with speaker detection, timestamps, and format conversion. It is an AI Agent Skill for Claude Code / OpenClaw, with 806 downloads so far.

How do I install Speech to Text Transcription?

Run "/install speech-to-text-transcription" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Speech to Text Transcription free?

Yes, Speech to Text Transcription is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Speech to Text Transcription support?

Speech to Text Transcription is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux, darwin, win32).

Who created Speech to Text Transcription?

It is built and maintained by Iván (@ivangdavila); the current version is v1.0.0.

More Skills