← 返回 Skills 市场
shu-hari

Speech to Text

作者 shu-hari · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
297
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install hf-whisper-speech-to-text
功能描述
Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...
使用说明 (SKILL.md)

Speech to Text

Use this skill to turn local audio files into text with a public Whisper-based endpoint.

Quick start

Run:

python3 scripts/transcribe.py /path/to/file.ogg

Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.

For machine-readable output:

python3 scripts/transcribe.py /path/to/file.ogg --json

To disable cleanup and keep the raw model text:

python3 scripts/transcribe.py /path/to/file.ogg --format raw

To force Chinese punctuation cleanup:

python3 scripts/transcribe.py /path/to/file.ogg --format zh

For English translation instead of same-language transcription:

python3 scripts/transcribe.py /path/to/file.ogg --task translate

Workflow

  1. Confirm the input is a local audio file.
  2. Run scripts/transcribe.py on it.
  3. If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
  4. If helpful, post-process into:
    • cleaned transcript
    • summary
    • action items
    • bilingual output

What the script does

The script:

  • uploads the local file to a public Gradio-backed Hugging Face Space
  • submits a Whisper transcription job
  • waits for completion via the Gradio event stream
  • prints the resulting text

Default endpoint:

  • https://hf-audio-whisper-large-v3-turbo.hf.space

Override it with:

python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space

or set:

export HF_WHISPER_SPACE=https://your-space.hf.space

Guardrails

  • Treat this as a best-effort public/free path, not a privacy-grade path.
  • Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
  • Expect rate limits, queueing, and occasional outages.
  • If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.

Output handling

Prefer to return:

  • the raw transcript when the user asked to "转文字/听写"
  • a cleaned version when punctuation is poor
  • a short note about uncertainty if names, numbers, or jargon may be wrong

Script

  • scripts/transcribe.py — public Whisper transcription helper
安全使用建议
This skill works as advertised: it reads a local audio file and uploads it to a public Hugging Face Gradio Space for Whisper-based transcription. Before installing or using it, consider privacy and trust: the default endpoint is a third-party public space (hf-audio-whisper-large-v3-turbo.hf.space), so do not send highly sensitive audio unless you accept third-party processing. You can override the space with HF_WHISPER_SPACE or --space to point to a self-hosted or trusted endpoint. Verify the space URL you use is trustworthy, and be aware of rate limits, queueing, and potential outages. The script makes outbound HTTP requests and prints results; review or audit the target space if you need confidentiality guarantees.
功能分析
Type: OpenClaw Skill Name: hf-whisper-speech-to-text Version: 1.0.0 The skill transcribes audio by uploading local files to a public Hugging Face Space (hf-audio-whisper-large-v3-turbo.hf.space). While the functionality is clearly documented and includes privacy guardrails in SKILL.md, the script (scripts/transcribe.py) possesses the capability to read and transmit any local file content to a third-party endpoint without strict file-type validation. This represents a potential data exfiltration risk if the AI agent is manipulated via prompt injection to process sensitive files (e.g., credentials or configuration) instead of audio files.
能力评估
Purpose & Capability
Name/description claim using a public Hugging Face Whisper Space; the included script and SKILL.md both implement exactly that (upload to a Gradio Space, call predict, wait for result). No unrelated binaries, env vars, or services are requested.
Instruction Scope
Instructions explicitly tell the agent to read a local audio file and upload it to a public Gradio/Hugging Face Space. This is expected for the stated purpose but has privacy implications (documented in guardrails). The skill does not attempt to read other files or arbitrary system state.
Install Mechanism
No install spec; skill is instruction + a small Python script. No external downloads or package installs are performed by the skill itself.
Credentials
No credentials or sensitive environment variables are required. The only optional environment variable (HF_WHISPER_SPACE) is used to override the target space URL and is justified by the purpose.
Persistence & Privilege
Skill is not declared always:true and does not request persistent system privileges. It runs as an on-demand script and does not modify other skills or global agent settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install hf-whisper-speech-to-text
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /hf-whisper-speech-to-text 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: public Whisper Space transcription with lightweight Chinese punctuation cleanup.
元数据
Slug hf-whisper-speech-to-text
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Speech to Text 是什么?

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 297 次。

如何安装 Speech to Text?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install hf-whisper-speech-to-text」即可一键安装,无需额外配置。

Speech to Text 是免费的吗?

是的,Speech to Text 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Speech to Text 支持哪些平台?

Speech to Text 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Speech to Text?

由 shu-hari(@shu-hari)开发并维护,当前版本 v1.0.0。

💬 留言讨论