← 返回 Skills 市场

Universal Video Analyzer Zh

Name: Universal Video Analyzer Zh
Author: lantianbaicai

作者 lantianbaicai · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ✓ 安全检测通过

121

总下载

当前安装

版本数

在 OpenClaw 中安装

/install universal-video-analyzer-zh

功能描述

通用视频分析器（中文版）- 使用多模态大模型分析视频内容，支持画面识别和语音转文字，生成结构化中文报告。支持豆包、智谱、通义千问等多种模型，用户自行配置 API Key。

使用说明 (SKILL.md)

安装前置依赖

# 1. 安装 Python 依赖
pip install requests openai-whisper torch tenacity Pillow python-dotenv

# 2. 安装 ffmpeg（必需）
# Windows: winget install Gyan.FFmpeg
# macOS: brew install ffmpeg
# Linux: sudo apt install ffmpeg

# 3. 首次运行会自动下载 Whisper 模型（约150MB），请确保网络畅通

⚠️ 数据隐私说明

本技能会将视频关键帧图片（base64编码）和语音转写文本发送到你配置的多模态模型API。请确认：

你接受将视频内容发送到所选模型服务商
敏感/私密视频请谨慎使用，或选择私有化部署的模型
分析结果仅保存在本地，不会上传到其他地方

触发条件

当用户发送视频文件（.mp4, .mov等）并希望分析内容时，自动触发此技能。

执行命令

python doubao_video_analyzer.py "{{video_path}}"

配置说明

必需配置

设置环境变量 VIDEO_ANALYZER_API_KEY 为你的多模态模型 API Key。

可选配置

环境变量	说明	默认值
`VIDEO_ANALYZER_MODEL`	使用的模型名称	`doubao-seed-2-0-pro-260215`
`VIDEO_ANALYZER_BASE_URL`	API 基础地址	`https://ark.cn-beijing.volces.com/api/v3`
`WHISPER_MODEL_DIR`	Whisper 模型本地路径	自动下载

支持的模型示例

模型提供商	MODEL 值	BASE_URL
豆包	`doubao-seed-2-0-pro-260215`	`https://ark.cn-beijing.volces.com/api/v3`
智谱 GLM-4V	`glm-4v-plus`	`https://open.bigmodel.cn/api/paas/v4`
通义千问 VL	`qwen-vl-plus`	`https://dashscope.aliyuncs.com/compatible-mode/v1`

功能特点

✅ 双轨分析：同时分析视频画面 + 语音转文字，生成完整报告 ✅ 模型无关：支持多种多模态模型，用户自由选择 ✅ 结构化输出：自动生成场景、核心信息、亮点等结构化内容 ✅ HTML可视化报告：自动生成精美排版的HTML报告，含关键帧展示、分析结果、语音文字稿 ✅ 国内可用：支持豆包、智谱、通义等国内模型，无需翻墙 ✅ 容错完善：ffmpeg错误检查、API超时保护、跨平台路径兼容

输出文件

每次分析会自动生成以下文件：

{视频名}_分析报告.md — Markdown格式报告
{视频名}_分析报告.html — HTML可视化报告（可直接用浏览器打开）
{视频名}_frames/ — 提取的关键帧图片

安全使用建议

This skill will extract video keyframes and audio text and send them (images as base64 data URIs and the transcript) to whatever model API you configure with VIDEO_ANALYZER_API_KEY and VIDEO_ANALYZER_BASE_URL. If your videos contain sensitive or private content, don't use a third-party hosted API key (or use a self-hosted/private model endpoint). Note the repository's registry metadata omitted the required API key even though SKILL.md and the script require it — double-check you set VIDEO_ANALYZER_API_KEY correctly. Installing dependencies (torch, openai-whisper) can download large packages and Whisper models; ensure you have disk space and bandwidth. Finally, verify the BASE_URL and provider you point the API key to (default is a Volcengine endpoint) before sending sensitive data, and prefer temporary or scoped API keys where possible.

功能分析

Type: OpenClaw Skill Name: universal-video-analyzer-zh Version: 1.1.0 The skill is a legitimate video analysis tool that extracts frames and audio using ffmpeg and the Whisper library, then sends the data to a user-configured AI model (such as Doubao, GLM, or Qwen) for summarization. The Python script (doubao_video_analyzer.py) implements safe subprocess handling, uses temporary directories for intermediate files, and provides clear privacy disclosures in SKILL.md. No evidence of data exfiltration, malicious persistence, or prompt injection was found.

能力评估

ℹ Purpose & Capability

The name/description (video analyzer using multimodal models) matches the code and SKILL.md: the script extracts frames, transcribes audio with Whisper, and calls a multimodal /chat/completions endpoint. One inconsistency: the registry metadata reported "Required env vars: none", but both SKILL.md and doubao_video_analyzer.py require VIDEO_ANALYZER_API_KEY (and allow optional vars). This mismatch in registry metadata should be corrected but does not change the skill's purpose.

ℹ Instruction Scope

The runtime instructions are narrowly scoped to: install dependencies, install ffmpeg, run the script on a video file. The script reads environment variables (VIDEO_ANALYZER_API_KEY, MODEL, BASE_URL, WHISPER-related vars, FRAME_COUNT, etc.) and will upload base64-encoded keyframe images and the transcribed audio text to the configured model API. SKILL.md explicitly warns about this data flow. There is no evidence of hidden or unrelated data collection, but users should note that video frames and transcripts are transmitted to the external model endpoint.

ℹ Install Mechanism

There is no automated install spec (instruction-only for OpenClaw) and the code is included. The SKILL.md instructs installing Python packages (torch, openai-whisper, etc.) and ffmpeg; these are expected for local transcription and image processing. Whisper model downloads (~150MB) are performed at runtime—this is large but expected. No suspicious or remote arbitrary binary downloads were found in the package itself.

✓ Credentials

The only required secret is an API key for the multimodal model (VIDEO_ANALYZER_API_KEY), which is appropriate for a skill that calls third-party model APIs. Optional environment variables control model name, base URL, and local whisper model location. The skill does not request unrelated credentials.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills or system-wide settings, and does not try to persist elevated privileges. It does recommend ways to store environment variables (including persistent environment variables), which is a convenience suggestion but not a dangerous privilege escalation.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install universal-video-analyzer-zh
安装完成后，直接呼叫该 Skill 的名称或使用 /universal-video-analyzer-zh 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

修复：添加必需环境变量声明、安装前置依赖说明、数据隐私提示

v1.0.0

universal-video-analyzer-zh 1.0.0 首发版发布 - 支持多模态大模型视频内容分析，自动识别画面与语音并生成结构化中文报告 - 用户可自定义并切换豆包、智谱、通义千问等不同模型，配置 API Key 即可 - 自动处理视频（mp4, mov等），识别场景、核心信息与亮点 - 内建 API 超时保护、ffmpeg错误检测和跨平台兼容 - 国内模型支持，无需翻墙

元数据

Slug universal-video-analyzer-zh

版本 1.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题