/install hf-whisper-speech-to-text
Speech to Text
Use this skill to turn local audio files into text with a public Whisper-based endpoint.
Quick start
Run:
python3 scripts/transcribe.py /path/to/file.ogg
Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.
For machine-readable output:
python3 scripts/transcribe.py /path/to/file.ogg --json
To disable cleanup and keep the raw model text:
python3 scripts/transcribe.py /path/to/file.ogg --format raw
To force Chinese punctuation cleanup:
python3 scripts/transcribe.py /path/to/file.ogg --format zh
For English translation instead of same-language transcription:
python3 scripts/transcribe.py /path/to/file.ogg --task translate
Workflow
- Confirm the input is a local audio file.
- Run
scripts/transcribe.pyon it. - If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
- If helpful, post-process into:
- cleaned transcript
- summary
- action items
- bilingual output
What the script does
The script:
- uploads the local file to a public Gradio-backed Hugging Face Space
- submits a Whisper transcription job
- waits for completion via the Gradio event stream
- prints the resulting text
Default endpoint:
https://hf-audio-whisper-large-v3-turbo.hf.space
Override it with:
python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space
or set:
export HF_WHISPER_SPACE=https://your-space.hf.space
Guardrails
- Treat this as a best-effort public/free path, not a privacy-grade path.
- Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
- Expect rate limits, queueing, and occasional outages.
- If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.
Output handling
Prefer to return:
- the raw transcript when the user asked to "转文字/听写"
- a cleaned version when punctuation is poor
- a short note about uncertainty if names, numbers, or jargon may be wrong
Script
scripts/transcribe.py— public Whisper transcription helper
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install hf-whisper-speech-to-text - 安装完成后,直接呼叫该 Skill 的名称或使用
/hf-whisper-speech-to-text触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Speech to Text 是什么?
Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 297 次。
如何安装 Speech to Text?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install hf-whisper-speech-to-text」即可一键安装,无需额外配置。
Speech to Text 是免费的吗?
是的,Speech to Text 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Speech to Text 支持哪些平台?
Speech to Text 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Speech to Text?
由 shu-hari(@shu-hari)开发并维护,当前版本 v1.0.0。