/install see-video
see-video
Extract frames from a video and inject them as a grid image + XML timestamps into LLM context.
Setup (first time only)
cd \x3Cskill directory>
npm install
Usage
node {baseDir}/scripts/inject.mjs \x3Cvideo_path> [--mode uniform|highlight] [--start N] [--end N]
On success, outputs JSON to stdout:
{
"gridPath": "/tmp/video_llm-frames.jpg",
"description": "\x3Cvideo_frames>...\x3C/video_frames>",
"duration": 1326,
"frameCount": 28,
"layout": { "cols": 4, "rows": 7, "cellW": 384, "cellH": 216 },
"videoWidth": 854,
"videoHeight": 480,
"inputSizeMb": 42.3
}
If the video exceeds 10 minutes and uniform mode was used without --start/--end, a hint field is included:
{
"hint": "Video is 30 minutes long. This is a uniform overview. For better scene coverage re-run with --mode highlight, or use --start/--end to zoom into a specific section."
}
Recommended workflow for long videos:
- First run with
--mode highlight— shows key scene changes across the whole video - If the user wants detail on a specific section, re-run with
--start N --end N
On error, writes ERROR: \x3Cmessage> + Hint: \x3Cdiagnosis> to stderr and exits 1.
Injection procedure
Step 1 — Run the script (bash tool):
node {baseDir}/scripts/inject.mjs "/path/to/video.mp4"
Step 2 — Parse JSON:
Extract gridPath and description.
Step 3 — Inject image (read tool):
read \x3CgridPath>
The read tool injects the jpg as a native multimodal image block into context.
After viewing the grid, use the description XML timestamps to reference frames:
"Look at the grid image above. Use the timestamps in the description XML to analyze the video. The number in the top-left of each cell is the frame index."
On error:
- Translate the
Hint:message into natural language for the user. Do not paste raw error output. - If
read \x3CgridPath>fails —/tmp/files are ephemeral. Re-run the script and read immediately.
Options
| Option | Default | Description |
|---|---|---|
--mode uniform |
✅ | Evenly spaced frames |
--mode highlight |
Scene-change biased sampling | |
--start N |
0 |
Segment start (seconds) |
--end N |
end of video | Segment end (seconds) |
Diagnostics
| Error | Cause | Action |
|---|---|---|
Input file not found |
File missing or dropped by channel media size limit | Ask the user to share the file path directly as text |
corrupt, incomplete, or unsupported format |
Damaged file, interrupted transfer, or unsupported codec | Try a different file, or use --start/--end to skip problematic sections |
moov atom not found |
Incomplete mp4 (streaming not finished) | Retry with a complete file |
ffmpeg not found |
ffmpeg not installed | Check ffmpeg installation |
Notes
- Frame count and cell size are determined automatically from video duration and aspect ratio
- Grid is ~1500×1500px, cell long side 384–512px
- Timestamps are in the
descriptionXML only, not overlaid on the image - Portrait and landscape videos both supported
- Telegram users: if a video file is not attached to the message, check
channels.telegram.mediaMaxMbin the OpenClaw config — the file may have been dropped at the channel level before reaching the agent
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install see-video - 安装完成后,直接呼叫该 Skill 的名称或使用
/see-video触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
see-video 是什么?
Use when the user sends a video file or asks about video content. Extracts frames and injects them as an image grid directly into the LLM context — no proxy... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 105 次。
如何安装 see-video?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install see-video」即可一键安装,无需额外配置。
see-video 是免费的吗?
是的,see-video 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
see-video 支持哪些平台?
see-video 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 see-video?
由 john-ver(@john-ver)开发并维护,当前版本 v1.0.0。