← Back to Skills Marketplace
108
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install speech2text
Description
Automatically converts speech messages in ogg/wav/mp3/m4a formats to text using offline Faster-Whisper with ffmpeg format conversion.
Usage Guidance
This skill appears to do what it says (convert audio to text using faster-whisper + ffmpeg) but has a few important caveats to consider before installing:
- Offline claim: The SKILL.md says 'offline' but the code calls WhisperModel(MODEL_SIZE) without bundled weights; faster-whisper will typically fetch model weights from the network if they are not already available locally. If you must avoid network/model downloads, preinstall model files and verify the model is loaded offline.
- Local file scanning: If no attachment is provided the skill will scan ~/.openclaw/media/inbound and pick the newest .ogg file. If you have sensitive audio in that location, the skill may read it. If you do not want that behavior, either avoid allowing automatic triggers or modify the code to require explicit attachments.
- Platform assumptions: The code only checks Windows ffmpeg paths (ffmpeg.exe) and SKILL.md shows a Windows installation path. On Linux/macOS the skill may not find ffmpeg without adjustments.
- Dependencies: You must pip install faster-whisper and pydub and have ffmpeg available. Model downloads may consume bandwidth and disk space.
Recommendations:
- Review the code (included) and, if you need true offline operation, predownload/install the chosen Whisper model and test model loading without network access.
- Run the skill in a sandbox or environment where reading ~/.openclaw/media/inbound is acceptable, or patch the code to require explicit attachments only.
- Verify ffmpeg is installed on your OS and adapt the ffmpeg path logic for non-Windows systems.
- If unsure, treat this as potentially privacy-sensitive and avoid enabling automatic triggers until you validate its behavior.
Capability Analysis
Type: OpenClaw Skill
Name: speech2text
Version: 1.0.0
The speech2text skill implements local audio transcription using the faster-whisper library and ffmpeg. The code in __init__.py safely handles audio conversion via subprocess.run using argument lists (preventing shell injection) and restricts file access to the expected OpenClaw media directory (~/.openclaw/media/inbound). No evidence of data exfiltration, malicious command execution, or prompt injection was found; the skill's behavior aligns strictly with its documented purpose.
Capability Assessment
Purpose & Capability
Name/description (speech→text using faster-whisper + ffmpeg) aligns with the code. However, SKILL.md and description emphasize 'offline' Faster-Whisper, while the code instantiates WhisperModel(MODEL_SIZE) without bundling model files — that will typically trigger model downloads from the network (e.g., Hugging Face) if weights are not present, contradicting the 'offline' claim. Also SKILL.md and config list only Windows ffmpeg paths; the skill has no OS restriction set, which is an inconsistency.
Instruction Scope
SKILL.md describes converting provided audio attachments; the code also automatically looks for the most recent .ogg in a hardcoded user directory (~/.openclaw/media/inbound) when no attachment is supplied. This automatic local-file scanning is not clearly described and could read unrelated user audio files. The code uses subprocess.run to call ffmpeg (expected) but will modify the subprocess PATH to include Windows ffmpeg locations.
Install Mechanism
No install spec (instruction-only), so nothing is fetched/installed by the platform. The code depends on external packages (faster-whisper, pydub) and on model weights—these are not provided and are likely downloaded by the faster-whisper/Hugging Face machinery at runtime, which is network activity not documented in SKILL.md's 'offline' claim.
Credentials
The skill requests no environment variables or credentials and does not require unusual system config access. It does expect ffmpeg to be installed and accessible (and tries Windows-specific paths). It temporarily adjusts PATH for the subprocess but does not persist credentials or require secrets.
Persistence & Privilege
always is false and the skill does not modify other skills or system-wide configs. It can be invoked autonomously (platform default) and SKILL.md suggests automatic triggering on voice messages — combined with its automatic local media scanning, this increases the chance it will read local audio without an explicit attachment, but it does not request elevated or persistent privileges.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install speech2text - After installation, invoke the skill by name or use
/speech2text - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of speech-to-text skill.
- Automatically transcribes voice messages to text in Chinese.
- Supports multiple audio formats (ogg, wav, mp3, m4a) with automatic conversion via ffmpeg.
- Works offline using the Faster-Whisper model.
- Simple installation with Python packages: faster-whisper, pydub (requires ffmpeg in system PATH).
- Configurable model size and default recognition language.
Metadata
Frequently Asked Questions
What is speech2text?
Automatically converts speech messages in ogg/wav/mp3/m4a formats to text using offline Faster-Whisper with ffmpeg format conversion. It is an AI Agent Skill for Claude Code / OpenClaw, with 108 downloads so far.
How do I install speech2text?
Run "/install speech2text" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is speech2text free?
Yes, speech2text is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does speech2text support?
speech2text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created speech2text?
It is built and maintained by lqwall26 (@lqwall26); the current version is v1.0.0.
More Skills