← 返回 Skills 市场

speech2text

Name: speech2text
Author: lqwall26

作者 lqwall26 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

108

总下载

当前安装

版本数

在 OpenClaw 中安装

/install speech2text

功能描述

Automatically converts speech messages in ogg/wav/mp3/m4a formats to text using offline Faster-Whisper with ffmpeg format conversion.

安全使用建议

This skill appears to do what it says (convert audio to text using faster-whisper + ffmpeg) but has a few important caveats to consider before installing: - Offline claim: The SKILL.md says 'offline' but the code calls WhisperModel(MODEL_SIZE) without bundled weights; faster-whisper will typically fetch model weights from the network if they are not already available locally. If you must avoid network/model downloads, preinstall model files and verify the model is loaded offline. - Local file scanning: If no attachment is provided the skill will scan ~/.openclaw/media/inbound and pick the newest .ogg file. If you have sensitive audio in that location, the skill may read it. If you do not want that behavior, either avoid allowing automatic triggers or modify the code to require explicit attachments. - Platform assumptions: The code only checks Windows ffmpeg paths (ffmpeg.exe) and SKILL.md shows a Windows installation path. On Linux/macOS the skill may not find ffmpeg without adjustments. - Dependencies: You must pip install faster-whisper and pydub and have ffmpeg available. Model downloads may consume bandwidth and disk space. Recommendations: - Review the code (included) and, if you need true offline operation, predownload/install the chosen Whisper model and test model loading without network access. - Run the skill in a sandbox or environment where reading ~/.openclaw/media/inbound is acceptable, or patch the code to require explicit attachments only. - Verify ffmpeg is installed on your OS and adapt the ffmpeg path logic for non-Windows systems. - If unsure, treat this as potentially privacy-sensitive and avoid enabling automatic triggers until you validate its behavior.

功能分析

Type: OpenClaw Skill Name: speech2text Version: 1.0.0 The speech2text skill implements local audio transcription using the faster-whisper library and ffmpeg. The code in __init__.py safely handles audio conversion via subprocess.run using argument lists (preventing shell injection) and restricts file access to the expected OpenClaw media directory (~/.openclaw/media/inbound). No evidence of data exfiltration, malicious command execution, or prompt injection was found; the skill's behavior aligns strictly with its documented purpose.

能力评估

⚠ Purpose & Capability

Name/description (speech→text using faster-whisper + ffmpeg) aligns with the code. However, SKILL.md and description emphasize 'offline' Faster-Whisper, while the code instantiates WhisperModel(MODEL_SIZE) without bundling model files — that will typically trigger model downloads from the network (e.g., Hugging Face) if weights are not present, contradicting the 'offline' claim. Also SKILL.md and config list only Windows ffmpeg paths; the skill has no OS restriction set, which is an inconsistency.

⚠ Instruction Scope

SKILL.md describes converting provided audio attachments; the code also automatically looks for the most recent .ogg in a hardcoded user directory (~/.openclaw/media/inbound) when no attachment is supplied. This automatic local-file scanning is not clearly described and could read unrelated user audio files. The code uses subprocess.run to call ffmpeg (expected) but will modify the subprocess PATH to include Windows ffmpeg locations.

ℹ Install Mechanism

No install spec (instruction-only), so nothing is fetched/installed by the platform. The code depends on external packages (faster-whisper, pydub) and on model weights—these are not provided and are likely downloaded by the faster-whisper/Hugging Face machinery at runtime, which is network activity not documented in SKILL.md's 'offline' claim.

✓ Credentials

The skill requests no environment variables or credentials and does not require unusual system config access. It does expect ffmpeg to be installed and accessible (and tries Windows-specific paths). It temporarily adjusts PATH for the subprocess but does not persist credentials or require secrets.

✓ Persistence & Privilege

always is false and the skill does not modify other skills or system-wide configs. It can be invoked autonomously (platform default) and SKILL.md suggests automatic triggering on voice messages — combined with its automatic local media scanning, this increases the chance it will read local audio without an explicit attachment, but it does not request elevated or persistent privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install speech2text
安装完成后，直接呼叫该 Skill 的名称或使用 /speech2text 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of speech-to-text skill. - Automatically transcribes voice messages to text in Chinese. - Supports multiple audio formats (ogg, wav, mp3, m4a) with automatic conversion via ffmpeg. - Works offline using the Faster-Whisper model. - Simple installation with Python packages: faster-whisper, pydub (requires ffmpeg in system PATH). - Configurable model size and default recognition language.

元数据

Slug speech2text

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

speech2text 是什么？

Automatically converts speech messages in ogg/wav/mp3/m4a formats to text using offline Faster-Whisper with ffmpeg format conversion. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 108 次。

如何安装 speech2text？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install speech2text」即可一键安装，无需额外配置。

speech2text 是免费的吗？

是的，speech2text 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

speech2text 支持哪些平台？

speech2text 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 speech2text？

由 lqwall26（@lqwall26）开发并维护，当前版本 v1.0.0。