← 返回 Skills 市场
Video Reader
作者
Qianke Meng
· GitHub ↗
· v4.1.1
· MIT-0
122
总下载
1
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install video-reader
功能描述
Tool-driven video question answering with frame extraction, sub-agent analysis, and audio transcription
安全使用建议
This skill appears to implement a real video QA system, but there are important mismatches and operational risks you should consider before installing or running it:
- Credentials & env vars: The skill bundle and docs mention both local Whisper (faster-whisper) and remote transcription APIs, but videoarm_audio.py currently requires WHISPER_API_KEY (and will call a remote transcription endpoint) unless you run a local whisper server. Do NOT paste your OpenAI/Anthropic/Groq API keys into environment variables for this skill until you confirm whether the skill will use a local model or an external API. Prefer testing first with a non-sensitive account or an isolated environment.
- Missing declared requirements: The manifest declares no required binaries or env vars, yet the code needs ffmpeg (required), optionally yt-dlp (for downloads), and Python packages (opencv, faster-whisper). Install and test in a sandbox or VM and run videoarm-doctor to verify dependencies before giving the skill access to important files.
- Local file writes & cleanup: The skill creates ~/.videoarm and writes logs and cached videos; its cleaner can delete ~/.openclaw/workspace/tmp — that could remove other OpenClaw workspace files. If you care about other workspace data, avoid running the cleaner with broad flags or inspect the cleaner code first.
- Data exposure via sub-agents: The orchestrator spawns sub-agents and writes frame-grid images to workspace tmp for those sub-agents to read. If the video contains sensitive content you do not want shared with remote models, ensure the sub-agent/image tools operate locally and that no remote vision/transcription endpoints are configured.
What to do next:
1. Inspect videoarm_audio.py and videoarm_local_whisper to confirm whether transcription runs locally or requires an API key in your deployment. 2. Run videoarm-doctor in a safe environment to see what dependencies are missing. 3. If you must provide API keys, create scoped/test keys and run in an isolated account. 4. If you want to use only local models, confirm the local server path and disable WHISPER_API_KEY/BASE_URL. 5. Consider running the skill inside a disposable container/VM to validate behavior and filesystem changes before using on your regular workstation.
功能分析
Type: OpenClaw Skill
Name: video-reader
Version: 4.1.1
The VideoARM skill implements a sophisticated video analysis pipeline that utilizes several high-risk capabilities, including spawning sub-agents via `sessions_spawn` for image analysis and executing shell commands through `subprocess` for video processing with `ffmpeg` and `yt-dlp` (e.g., in `videoarm_cli/videoarm_audio.py` and `videoarm_cli/videoarm_download.py`). Notably, the package includes a local Whisper transcription service (`videoarm_local_whisper/server.py`) and a setup script (`videoarm_local_whisper/setup.py`) that establishes persistence on macOS by installing a `launchd` agent. While these features are plausibly necessary for the stated purpose of tool-driven video QA and are documented, the combination of persistence, local server execution, and broad sub-agent orchestration constitutes a significant security footprint without explicit safeguards against common risks like argument injection or unauthorized persistence.
能力评估
Purpose & Capability
The name/description (video question answering with frame extraction and transcription) matches the code and runtime instructions: tools for download, metadata, frame extraction, and audio transcription are present. However the skill metadata declares no required binaries or environment variables while the code clearly expects external binaries (ffmpeg, optional yt-dlp) and reads multiple environment variables (WHISPER_API_KEY, WHISPER_BASE_URL, WHISPER_MODEL, VISION_API_KEY/OPENAI_API_KEY, ANTHROPIC_API_KEY). That mismatch between declared requirements and actual dependencies is unexpected and should be resolved before trusting the skill.
Instruction Scope
SKILL.md confines runtime actions to video download/inspect/extract/transcribe and spawning sub-agents to analyze image grids; it instructs the agent to use /tmp/videoarm_memory.json as single source-of-truth and to spawn isolated sub-agents via sessions_spawn. It does not instruct reading arbitrary system files or exfiltration endpoints. The memory file usage and sub-agent dispatch are explicit and scoped to the skill's purpose.
Install Mechanism
There is no install spec in the skill manifest (instruction-only), but the bundle includes a full Python package (pyproject.toml, CLI scripts, requirements). That means the package will not be auto-installed by the platform; manual installation is required to get dependencies (opencv, faster-whisper, ffmpeg, yt-dlp). This is reasonable but increases the chance users will miss required system binaries or optional components. No suspicious remote download URLs or archive extraction were found in the install artifacts.
Credentials
The manifest lists no required environment variables or primary credential, yet the code and docs read/expect multiple credential-like env vars (WHISPER_API_KEY, WHISPER_BASE_URL, VISION_API_KEY/OPENAI_API_KEY, ANTHROPIC_API_KEY, HTTPS_PROXY, VIDEOARM_SESSION_ID). In particular, videoarm_audio.py currently requires WHISPER_API_KEY and will return an error if it is not set, contradicting README statements about local faster-whisper working without API keys. Asking for API keys or base URLs (and implicitly supporting OpenAI/Anthropic/Groq endpoints) is reasonable for optional cloud transcription/vision backends, but the skill's manifest does not declare these needs and the code will attempt network API calls when an API key/base URL is supplied — so do not provide secrets until you confirm which backend (local vs remote) will be used.
Persistence & Privilege
The skill writes logs and cache under ~/.videoarm and creates files under ~/.openclaw/workspace/tmp and /tmp/videoarm_memory.json. The provided cleaning tool (videoarm-clean) can delete files in ~/.openclaw/workspace/tmp and the VideoARM memory file; that may remove other workspace artifacts if run with broad arguments. The skill does not set always:true and does not modify other skills' configs, but its file I/O footprint in user home and OpenClaw workspace is significant and could affect other local agent state if cleaning tools are used carelessly.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install video-reader - 安装完成后,直接呼叫该 Skill 的名称或使用
/video-reader触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v4.1.1
Fixed 3x2 grid layout with frame numbers, support multiple grids for 30/60+ frames
v4.1.0
Summary: Major redesign—introduces a tool-driven orchestrator architecture for video question answering with strict memory management and sub-agent dispatch.
- Rebranded as "videoarm": a video QA orchestrator that never analyzes images directly, but dispatches sub-agents for all frame and audio analysis.
- Enforces strict read/write of a single memory file (`/tmp/videoarm_memory.json`) as the source of truth on each turn; forbids reliance on prior tool outputs in conversation history.
- Sub-agent pattern standardized: main agent extracts frames or audio, spawns stateless sub-agents for analysis (scene captions, targeted questions), and writes all results to memory.
- Improved reproducibility and architecture: tool outputs go only to memory, allowing context to be fully rebuilt each turn and enabling parallel or isolated sub-agent analysis.
- Clarified use of toolset: defined pipelines for downloading, metadata, frame extraction, audio transcription, and sub-agent dispatch for both visual and audio QA.
- Documented recommended strategies, decision-making, and detailed memory file structure for consistent, traceable workflows.
元数据
常见问题
Video Reader 是什么?
Tool-driven video question answering with frame extraction, sub-agent analysis, and audio transcription. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 122 次。
如何安装 Video Reader?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install video-reader」即可一键安装,无需额外配置。
Video Reader 是免费的吗?
是的,Video Reader 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Video Reader 支持哪些平台?
Video Reader 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Video Reader?
由 Qianke Meng(@qiankemeng)开发并维护,当前版本 v4.1.1。
推荐 Skills