← 返回 Skills 市场

Tom Video Understanding

Name: Tom Video Understanding
Author: tomuiv

作者 TOMUIV · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

178

总下载

当前安装

版本数

在 OpenClaw 中安装

/install tom-video-understanding

功能描述

Local video comprehension skill. Use ffmpeg to extract audio and frames, FunASR for speech recognition, and qwen3-vl for image understanding.

安全使用建议

This skill appears to do what it says: extract audio/frames and run local models. Before using it, ensure: (1) you have ffmpeg, conda, and Ollama installed and you trust the sources from which models will be downloaded (model downloads will use network and can be large); (2) if you plan to use the optional cloud LLM step, confirm which endpoint and credentials you'll use and avoid sending sensitive video/audio to unknown cloud services; (3) update the Windows-specific ModelScope cache path and any example mirrors (README's OLLAMA_BASE_URL example is a placeholder) to suit your environment; (4) verify Ollama's model provenance (qwen3-vl) and FunASR model identifiers before pulling. If the skill ever asked to read unrelated system files, request unrelated credentials, or included opaque download URLs or install scripts, treat it as suspicious and do not run without deeper review.

功能分析

Type: OpenClaw Skill Name: tom-video-understanding Version: 1.0.0 The skill bundle provides legitimate instructions for local video processing using ffmpeg, FunASR, and Ollama. While SKILL.md contains hardcoded environment paths (e.g., C:/Users/TOM/.cache/modelscope) and executes shell commands for media extraction, these actions are directly aligned with the stated purpose of video understanding and lack any indicators of malicious intent or data exfiltration.

能力评估

✓ Purpose & Capability

The name/description (local video comprehension) matches the instructions: ffmpeg for audio/frames extraction, FunASR for Chinese ASR, and qwen3-vl via Ollama for image understanding. None of the required actions or tools appear unrelated to the stated purpose.

ℹ Instruction Scope

SKILL.md confines actions to extracting audio/frames, running FunASR in a conda env, and querying a local Ollama model. It does reference a specific ModelScope cache path (C:/Users/TOM/.cache/modelscope) and suggests copying files if paths contain Chinese characters — this is Windows- and user-specific and may need adjustment. The doc also allows optional "Summary/Analysis → Cloud LLM API (if needed)", which would send derived data externally if used; that is outside the local-only flow and should be considered separately.

✓ Install Mechanism

This is instruction-only with no install spec or packaged downloads in the skill itself. That reduces risk. Note: models (FunASR/ModelScope and qwen3-vl) and Ollama are expected to be pulled/downloaded at runtime by the user, which involves network activity and large binary downloads but is not performed by the skill bundle itself.

ℹ Credentials

The skill declares no required env vars or credentials. The instructions set MODELSCOPE_CACHE to a specific, user-named path (C:/Users/TOM/...) which is an implementation detail and not a request for secrets, but it may reveal or assume a specific user environment. The skill also mentions optionally using a cloud LLM for summaries; that would require credentials/configuration provided by the user but are not requested by the skill itself.

✓ Persistence & Privilege

The skill does not request always-on presence and makes no claims to modify other skills or system-wide configs. It is user-invocable and can be run locally as-needed.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install tom-video-understanding
安装完成后，直接呼叫该 Skill 的名称或使用 /tom-video-understanding 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of the video-understanding skill. - Enables local video content comprehension using ffmpeg, FunASR, and qwen3-vl. - Extracts audio and key frames from videos via ffmpeg commands. - Performs local Chinese speech recognition with FunASR. - Provides detailed image understanding for video frames using qwen3-vl through Ollama. - Outlines a step-by-step workflow and key prerequisites for setup and usage.

元数据

Slug tom-video-understanding

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Tom Video Understanding 是什么？

Local video comprehension skill. Use ffmpeg to extract audio and frames, FunASR for speech recognition, and qwen3-vl for image understanding. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 178 次。

如何安装 Tom Video Understanding？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install tom-video-understanding」即可一键安装，无需额外配置。

Tom Video Understanding 是免费的吗？

是的，Tom Video Understanding 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Tom Video Understanding 支持哪些平台？

Tom Video Understanding 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Tom Video Understanding？

由 TOMUIV（@tomuiv）开发并维护，当前版本 v1.0.0。