← 返回 Skills 市场

video-stt

Name: video-stt
Author: damiencronw

作者 damienCronw · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

366

总下载

当前安装

版本数

在 OpenClaw 中安装

/install video-stt

功能描述

Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants...

安全使用建议

This skill largely does what it claims (downloads video audio and runs local Whisper transcription), but there are several red flags you should consider before using it: - The skill advertises cloud API support and shows environment variable names (OPENAI_API_KEY, etc.) in the README, but the provided scripts do not implement cloud transcription — the shell script will exit on --api. Do not export API keys solely because the docs mention them unless you inspect and trust updated code. - The scripts will attempt to install dependencies at runtime: they call Homebrew ('brew install') and install Python packages via 'uv pip install'. On non-macOS systems 'brew' may not exist and automatic installation may fail or be undesirable. Review and run installs manually in a controlled environment. - The bash script builds a python -c one-liner embedding variables (MODEL, FORMAT, OUTPUT_FILE, AUDIO_PATH) without robust escaping. If you (or an agent) pass untrusted values into those CLI arguments, there's a risk of shell/command injection. Prefer running the Python code from a file with properly passed arguments or sanitize inputs. - The skill will download arbitrary URLs you give it (via yt-dlp) and write audio/transcript files to the skill folder; only run it on content and URLs you trust and in a sandbox if possible. Recommendations: - Inspect the scripts locally, remove or modify the inline python -c usage to a safer invocation, and remove automatic 'brew install' calls or gate them with an explicit prompt. - If you need cloud transcription, either implement the API flow securely (and only then provide API keys) or avoid setting API keys. - Run this skill in a controlled environment (container or VM) the first time so you can observe its behavior and confirm it doesn't attempt unexpected network access or installs. Because of the documentation/code mismatches and the unsafe variable embedding, treat this skill with caution — useful but not turnkey-safe without review.

功能分析

Type: OpenClaw Skill Name: video-stt Version: 1.0.0 The skill bundle provides video-to-text transcription using yt-dlp and Whisper, but contains a significant command injection vulnerability in `scripts/stt.sh`. The script interpolates shell variables (like $MODEL and $FORMAT) directly into a Python one-liner (`python3 -c`), which could allow arbitrary code execution if the input is crafted. Additionally, `scripts/stt.py` aggressively attempts to install system dependencies using `brew install` without user confirmation. While these appear to be poor coding practices rather than intentional malware, they represent high-risk behaviors.

能力评估

⚠ Purpose & Capability

The skill's name/description match the included scripts (download audio + transcribe). However registry metadata declares no required binaries or env vars while SKILL.md and the scripts clearly require yt-dlp, ffmpeg, Python/uv, and optionally cloud API keys. SKILL.md advertises cloud APIs (OpenAI/Azure/Google) but the provided scripts implement only local Whisper; the shell script exits with 'Cloud API mode not implemented' if --api is used. This mismatch between description/documentation and actual code is an incoherence.

⚠ Instruction Scope

Runtime instructions tell the agent/user to run the bundled shell script which will download arbitrary URLs and run local transcription. The scripts will attempt to install missing tools (see check_dependencies -> brew install in stt.py, and uv pip install whisper in both scripts). The bash script builds and injects shell variables directly into a python -c one-liner (MODEL, FORMAT, OUTPUT_FILE, AUDIO_PATH) without escaping; that can lead to command/argument injection if untrusted values are passed. The scripts do not exfiltrate data to external endpoints, but they do download remote video content and may call out to PyPI/brew to install packages.

ℹ Install Mechanism

There is no formal install spec (instruction-only), which lowers systemic install risk. The code will, however, trigger package installs at runtime: the Python script may call 'brew install' for missing system binaries, and both scripts install Python packages via 'uv pip install whisper'. These are standard package installs (Homebrew/PyPI) — not downloads from arbitrary URLs — but invoking 'brew' without platform checks or consent is fragile and potentially disruptive on non-macOS systems.

⚠ Credentials

SKILL.md documents optional environment variables (OPENAI_API_KEY, SILICONFLOW_API_KEY) for cloud usage, but the included scripts do not implement cloud API flows (the shell script refuses --api). The registry metadata declares no required env vars; the docs requesting API keys are therefore inconsistent. Asking users to set API keys in docs without code that uses them is confusing and could lead to accidental credential exposure if users set secrets expecting cloud support.

✓ Persistence & Privilege

The skill does not request persistent or platform-wide privileges (always:false). It creates local directories under the skill script directory (audio/ and output/) and may create a local virtualenv (.venv). It doesn't modify other skills or global agent settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install video-stt
安装完成后，直接呼叫该 Skill 的名称或使用 /video-stt 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of the video-stt skill. - Extracts audio from video URLs and transcribes speech to text. - Supports both local Whisper models and multiple cloud APIs (OpenAI, Azure, Google). - Offers output in plain text, SRT, VTT, or JSON formats. - Includes command-line and Python usage instructions with environment setup guidance.

元数据

Slug video-stt

版本 1.0.0

许可证 —

累计安装 1

当前安装数 1

历史版本数 1

常见问题

video-stt 是什么？

Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 366 次。

如何安装 video-stt？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install video-stt」即可一键安装，无需额外配置。

video-stt 是免费的吗？

是的，video-stt 完全免费（开源免费），可自由下载、安装和使用。

video-stt 支持哪些平台？

video-stt 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 video-stt？

由 damienCronw（@damiencronw）开发并维护，当前版本 v1.0.0。