/install hf-whisper-speech-to-text
Speech to Text
Use this skill to turn local audio files into text with a public Whisper-based endpoint.
Quick start
Run:
python3 scripts/transcribe.py /path/to/file.ogg
Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.
For machine-readable output:
python3 scripts/transcribe.py /path/to/file.ogg --json
To disable cleanup and keep the raw model text:
python3 scripts/transcribe.py /path/to/file.ogg --format raw
To force Chinese punctuation cleanup:
python3 scripts/transcribe.py /path/to/file.ogg --format zh
For English translation instead of same-language transcription:
python3 scripts/transcribe.py /path/to/file.ogg --task translate
Workflow
- Confirm the input is a local audio file.
- Run
scripts/transcribe.pyon it. - If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
- If helpful, post-process into:
- cleaned transcript
- summary
- action items
- bilingual output
What the script does
The script:
- uploads the local file to a public Gradio-backed Hugging Face Space
- submits a Whisper transcription job
- waits for completion via the Gradio event stream
- prints the resulting text
Default endpoint:
https://hf-audio-whisper-large-v3-turbo.hf.space
Override it with:
python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space
or set:
export HF_WHISPER_SPACE=https://your-space.hf.space
Guardrails
- Treat this as a best-effort public/free path, not a privacy-grade path.
- Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
- Expect rate limits, queueing, and occasional outages.
- If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.
Output handling
Prefer to return:
- the raw transcript when the user asked to "转文字/听写"
- a cleaned version when punctuation is poor
- a short note about uncertainty if names, numbers, or jargon may be wrong
Script
scripts/transcribe.py— public Whisper transcription helper
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install hf-whisper-speech-to-text - After installation, invoke the skill by name or use
/hf-whisper-speech-to-text - Provide required inputs per the skill's parameter spec and get structured output
What is Speech to Text?
Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me... It is an AI Agent Skill for Claude Code / OpenClaw, with 297 downloads so far.
How do I install Speech to Text?
Run "/install hf-whisper-speech-to-text" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Speech to Text free?
Yes, Speech to Text is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Speech to Text support?
Speech to Text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Speech to Text?
It is built and maintained by shu-hari (@shu-hari); the current version is v1.0.0.