← Back to Skills Marketplace

Jarvis-Video-STT

Name: Jarvis-Video-STT
Author: chongjie-ran

by chongjie-ran · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install jarvis-video-stt

Description

Jarvis-Video-STT - 批量视频语音转文字工具。基于Faster-Whisper，支持多进程并行、进度条、汇总报告。 **触发场景**： - 用户需要将视频中的语音转换为文字/字幕 - 批量处理多个视频 - 需要生成SRT字幕或纯文本 - 需要处理报告查看结果统计 **使用方式**： 1. 确认已...

README (SKILL.md)

Jarvis-Video-STT Skill

快速开始

1. 安装依赖

pip install faster-whisper tqdm 确保ffmpeg已安装 (brew install ffmpeg on macOS)

2. 基本用法

medium模式（高精度，推荐）： python ~/.openclaw/workspace-researcher/tools/jarvis-video-stt/batch_whisper.py -i videos/*.mp4 -o results -m medium

small模式（快速）： python ~/.openclaw/workspace-researcher/tools/jarvis-video-stt/batch_whisper.py -i videos/*.mp4 -o results -m small

指定语言（略快）： python batch_whisper.py -i videos/ -o results -m medium -l zh

调整并行数： python batch_whisper.py -i videos/ -o results -w 4

3. 参数说明

参数	简写	说明	默认值
--input	-i	视频路径/文件夹/通配符	必填
--output	-o	输出目录	output
--model	-m	small/medium	medium
--language	-l	语言代码，None=自动	None
--workers	-w	并行进程数	3
--cpu	-	强制使用CPU	False

4. 输出文件

每个视频生成：

视频名.srt - 带时间戳字幕
视频名.txt - 纯文本

整体生成：

report.json - JSON汇总报告
report.md - Markdown汇总报告

性能参考

模型	一小时视频(单进程)	推荐并行
small	~2分钟	4进程
medium	~5分钟	3进程
large-v3	~8分钟	2进程

适用场景

课程视频转文字
电影/纪录片字幕生成
播客/访谈转录
短视频内容分析
视频内容检索预处理

故障排除

Q: 报 faster-whisper 找不到？ pip install faster-whisper

Q: 报 ffmpeg 找不到？ brew install ffmpeg (macOS) apt install ffmpeg (Ubuntu)

Q: Mac显存不足？减少并行数：-w 2

Usage Guidance

这是一个内部一致的批量视频转写工具，但在安装/使用前请注意： - faster-whisper 会在首次加载模型时从远程仓库下载模型权重（网络流量、较大磁盘占用）；如果使用私有模型，你可能需要提供 HUGGINGFACE_HUB_TOKEN 或类似凭据（该技能本身不声明或传送任何凭证）。 - 转写可能非常消耗资源（CPU/GPU、内存、磁盘）；在小批量或非敏感数据上先测试，调整 -w（并行）和 -m（模型大小）以避免 OOM 或长时间占用。建议在没有 GPU 的情况下使用 --cpu 或减小并行度。 - 脚本通过 ffmpeg 提取音轨（使用 os.system），会在输出目录生成临时文件和结果文件；确保输出目录是你希望写入的位置并备份重要数据。 - 实现上存在潜在稳定性问题（例如将已加载的 model 对象传给 multiprocessing 子进程可能导致 pickling/跨进程问题），如果遇到多进程异常，可使用 -w 1 或修改代码使每个子进程各自加载模型。总体：若你接受模型下载与资源消耗的后果，可以使用；如需在受控环境中运行（无外网、限制磁盘/带宽或敏感视频），先审查/修改脚本以符合你的政策（例如预先手动下载模型并指定本地路径）。

Capability Analysis

Type: OpenClaw Skill Name: jarvis-video-stt Version: 1.0.0 The script batch_whisper.py contains a shell injection vulnerability in the extract_audio function, where os.system is used to execute ffmpeg commands with unvalidated file paths. While the tool's functionality for batch video-to-text transcription using faster-whisper is consistent with its documentation in SKILL.md, the use of f-strings to construct shell commands without sanitization allows for potential arbitrary command execution via malicious filenames. No evidence of intentional malice or data exfiltration was detected.

Capability Assessment

✓ Purpose & Capability

技能名、描述、SKILL.md 和 batch_whisper.py 的行为一致：提取视频音轨（ffmpeg）、用 faster-whisper 转写并输出 SRT/TXT/报告。没有请求与转写无关的凭据、系统路径或二进制依赖。

ℹ Instruction Scope

运行说明仅要求 pip install faster-whisper tqdm 和安装 ffmpeg，且示例命令与脚本匹配。需要注意：faster-whisper 在加载模型时会在首次运行时从远程（如 Hugging Face）下载模型权重，这在 SKILL.md 中未明确说明；下载和模型加载会产生网络通信、显著磁盘占用和高计算需求。

✓ Install Mechanism

无 install spec（仅说明性依赖安装），pip 安装 faster-whisper/tqdm 是常见做法；没有从不信任的 URL 下载或解压可执行文件。风险主要来自模型权重的自动下载（来自公共模型托管服务）。

✓ Credentials

技能不要求环境变量或凭据，代码中也未访问敏感 env；唯一相关的外部访问是模型权重下载（公开模型不需凭据，私有模型会需要相应的 Hugging Face 令牌，如果你用到私有模型才需要提供）。

✓ Persistence & Privilege

技能不会设置 always 或修改其他技能/系统配置；其文件 I/O 限于指定输出目录（包含临时子目录），权限要求与其功能相称。

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install jarvis-video-stt
After installation, invoke the skill by name or use /jarvis-video-stt
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of jarvis-video-stt: - Batch video-to-text transcription tool based on Faster-Whisper. - Supports multi-process parallelism, progress bar, and summary report generation. - Outputs subtitles (.srt), plain text (.txt), and both machine-readable (.json) and human-readable (.md) summary reports. - Supports common video formats: MP4, MKV, AVI, MOV. - Includes command-line options for model size, language selection, and parallelism. - Designed for use cases like subtitle generation, transcription, and content preprocessing.

Metadata

Slug jarvis-video-stt

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Jarvis-Video-STT?

Jarvis-Video-STT - 批量视频语音转文字工具。基于Faster-Whisper，支持多进程并行、进度条、汇总报告。 **触发场景**： - 用户需要将视频中的语音转换为文字/字幕 - 批量处理多个视频 - 需要生成SRT字幕或纯文本 - 需要处理报告查看结果统计 **使用方式**： 1. 确认已... It is an AI Agent Skill for Claude Code / OpenClaw, with 92 downloads so far.

How do I install Jarvis-Video-STT?

Run "/install jarvis-video-stt" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Jarvis-Video-STT free?

Yes, Jarvis-Video-STT is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Jarvis-Video-STT support?

Jarvis-Video-STT is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Jarvis-Video-STT?

It is built and maintained by chongjie-ran (@chongjie-ran); the current version is v1.0.0.

More Skills