← Back to Skills Marketplace
shu-hari

Speech to Text

by shu-hari · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
297
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install hf-whisper-speech-to-text
Description
Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...
README (SKILL.md)

Speech to Text

Use this skill to turn local audio files into text with a public Whisper-based endpoint.

Quick start

Run:

python3 scripts/transcribe.py /path/to/file.ogg

Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.

For machine-readable output:

python3 scripts/transcribe.py /path/to/file.ogg --json

To disable cleanup and keep the raw model text:

python3 scripts/transcribe.py /path/to/file.ogg --format raw

To force Chinese punctuation cleanup:

python3 scripts/transcribe.py /path/to/file.ogg --format zh

For English translation instead of same-language transcription:

python3 scripts/transcribe.py /path/to/file.ogg --task translate

Workflow

  1. Confirm the input is a local audio file.
  2. Run scripts/transcribe.py on it.
  3. If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
  4. If helpful, post-process into:
    • cleaned transcript
    • summary
    • action items
    • bilingual output

What the script does

The script:

  • uploads the local file to a public Gradio-backed Hugging Face Space
  • submits a Whisper transcription job
  • waits for completion via the Gradio event stream
  • prints the resulting text

Default endpoint:

  • https://hf-audio-whisper-large-v3-turbo.hf.space

Override it with:

python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space

or set:

export HF_WHISPER_SPACE=https://your-space.hf.space

Guardrails

  • Treat this as a best-effort public/free path, not a privacy-grade path.
  • Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
  • Expect rate limits, queueing, and occasional outages.
  • If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.

Output handling

Prefer to return:

  • the raw transcript when the user asked to "转文字/听写"
  • a cleaned version when punctuation is poor
  • a short note about uncertainty if names, numbers, or jargon may be wrong

Script

  • scripts/transcribe.py — public Whisper transcription helper
Usage Guidance
This skill works as advertised: it reads a local audio file and uploads it to a public Hugging Face Gradio Space for Whisper-based transcription. Before installing or using it, consider privacy and trust: the default endpoint is a third-party public space (hf-audio-whisper-large-v3-turbo.hf.space), so do not send highly sensitive audio unless you accept third-party processing. You can override the space with HF_WHISPER_SPACE or --space to point to a self-hosted or trusted endpoint. Verify the space URL you use is trustworthy, and be aware of rate limits, queueing, and potential outages. The script makes outbound HTTP requests and prints results; review or audit the target space if you need confidentiality guarantees.
Capability Analysis
Type: OpenClaw Skill Name: hf-whisper-speech-to-text Version: 1.0.0 The skill transcribes audio by uploading local files to a public Hugging Face Space (hf-audio-whisper-large-v3-turbo.hf.space). While the functionality is clearly documented and includes privacy guardrails in SKILL.md, the script (scripts/transcribe.py) possesses the capability to read and transmit any local file content to a third-party endpoint without strict file-type validation. This represents a potential data exfiltration risk if the AI agent is manipulated via prompt injection to process sensitive files (e.g., credentials or configuration) instead of audio files.
Capability Assessment
Purpose & Capability
Name/description claim using a public Hugging Face Whisper Space; the included script and SKILL.md both implement exactly that (upload to a Gradio Space, call predict, wait for result). No unrelated binaries, env vars, or services are requested.
Instruction Scope
Instructions explicitly tell the agent to read a local audio file and upload it to a public Gradio/Hugging Face Space. This is expected for the stated purpose but has privacy implications (documented in guardrails). The skill does not attempt to read other files or arbitrary system state.
Install Mechanism
No install spec; skill is instruction + a small Python script. No external downloads or package installs are performed by the skill itself.
Credentials
No credentials or sensitive environment variables are required. The only optional environment variable (HF_WHISPER_SPACE) is used to override the target space URL and is justified by the purpose.
Persistence & Privilege
Skill is not declared always:true and does not request persistent system privileges. It runs as an on-demand script and does not modify other skills or global agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install hf-whisper-speech-to-text
  3. After installation, invoke the skill by name or use /hf-whisper-speech-to-text
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: public Whisper Space transcription with lightweight Chinese punctuation cleanup.
Metadata
Slug hf-whisper-speech-to-text
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Speech to Text?

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me... It is an AI Agent Skill for Claude Code / OpenClaw, with 297 downloads so far.

How do I install Speech to Text?

Run "/install hf-whisper-speech-to-text" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Speech to Text free?

Yes, Speech to Text is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Speech to Text support?

Speech to Text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Speech to Text?

It is built and maintained by shu-hari (@shu-hari); the current version is v1.0.0.

💬 Comments