Description

Generate videos from text descriptions using ZhipuAI CogVideoX-3 model. Supports text-to-video, image-to-video, and first/last frame-to-video generation. Aut...

README (SKILL.md)

LLM Video Generator

Name: llm-video-generator
Author: baokui

Generate videos via ZhipuAI CogVideoX-3. Each API call produces ~5s of video. For longer videos, chain multiple calls using last-frame continuation, then concatenate.

Scripts

All scripts use /opt/anaconda3/bin/python3. Resolve \x3Cskill-dir> to this skill's directory.

Script	Purpose
`scripts/video_gen.py`	Core generation (3 modes: text2video, image2video, frames2video)
`scripts/extract_last_frame.py`	Extract last frame from a video (for continuation)
`scripts/concat_videos.py`	Concatenate multiple video segments into one

Workflow

Step 1: Assess Request & Clarify

Clear request → proceed to Step 2. A request is clear when:

Video content/scene is described with enough detail
Style or visual tone is specified or implied
Duration is stated (default: 5s if not specified)

Vague request → propose a plan first:

基于你的需求，我拟定了以下视频方案：

📹 **视频内容**: [detailed scene description with key moments]
🎨 **视频风格**: [e.g., 写实/动画/电影感/温馨...]
⏱️ **视频时长**: [Xs, note: will be generated in 5s segments]
🔊 **背景音乐**: 有/无
📐 **分辨率**: 1920x1080
🎞️ **帧率**: 30fps

你觉得这个方案可以吗？需要调整哪些部分？

Iterate with the user until confirmed.

Step 2: Estimate Time & Notify User

Before starting generation, calculate and report the estimated time:

Time estimation formula:

Base: 1 minute per second of video (e.g., 20s video ≈ 20 minutes)
High-definition (4K or 60fps): add +30% (e.g., 20s 4K video ≈ 26 minutes)
Additional overhead: ~2 minutes for frame extraction, concatenation, and compression
Segments: ceil(target_duration / 5)

MUST send this message to the user before starting generation:

⏳ **视频生成预估**

📊 分段计划：{N} 段（每段约5秒）
⏱️ 预计总耗时：约 {estimated_minutes} 分钟
📐 分辨率：{resolution}

视频生成是一个耗时过程，请耐心等待。我会在每段完成后实时汇报进度。

Example for a 30s 1080P video:

6 segments, base time = 30 minutes, +2 min overhead → ~32 minutes
Message: "预计总耗时：约 32 分钟"

Example for a 20s 4K video:

4 segments, base time = 20 * 1.3 = 26 min, +2 min → ~28 minutes

Step 3: Plan Generation Segments

Each API call produces ~5 seconds. Calculate segments: ceil(target_duration / 5)

For multi-segment videos, plan how the content evolves across segments. Write a prompt for each segment describing what happens in that 5-second window, maintaining visual continuity.

Step 4: Execute Generation with Progress Reports

CRITICAL: After each segment completes, IMMEDIATELY send a progress message to the user before starting the next segment. Do not wait until all segments are done.

Progress message format (send via message tool or inline reply after each segment):

✅ 进度：{completed}/{total} 段完成（第{N}段已生成）
📝 内容：{brief segment description}
⏱️ 本段耗时：{minutes}分钟
📊 预计剩余：约 {remaining_minutes} 分钟

Generation process:

Segment 1 — Text-to-Video:

/opt/anaconda3/bin/python3 \x3Cskill-dir>/scripts/video_gen.py text2video \
  --prompt "\x3Csegment_1_prompt>" \
  --quality quality --audio true --size 1920x1080 --fps 30 \
  --output-dir \x3Coutput-dir> --max-wait 900

→ Send progress message to user

Segments 2+ — Image-to-Video (last-frame continuation):

For each subsequent segment:

Extract last frame from the previous segment's video:

/opt/anaconda3/bin/python3 \x3Cskill-dir>/scripts/extract_last_frame.py \
  \x3Cprevious_video.mp4> --output \x3Coutput-dir>/frame_segN.png

Generate next segment using the last frame as input:

/opt/anaconda3/bin/python3 \x3Cskill-dir>/scripts/video_gen.py image2video \
  --prompt "\x3Csegment_N_prompt>" \
  --image-url \x3Coutput-dir>/frame_segN.png \
  --quality quality --audio true --size 1920x1080 --fps 30 \
  --output-dir \x3Coutput-dir> --max-wait 900

→ Send progress message to user

Repeat for all segments.

Alternative — Frames-to-Video mode:

If you have both a starting and ending image for a segment:

/opt/anaconda3/bin/python3 \x3Cskill-dir>/scripts/video_gen.py frames2video \
  --prompt "\x3Cdescription>" \
  --first-frame \x3Cfirst.png> --last-frame \x3Clast.png> \
  --quality quality --audio true --size 1920x1080 --fps 30 \
  --output-dir \x3Coutput-dir>

Step 5: Concatenate Segments

After all segments are generated, combine them:

/opt/anaconda3/bin/python3 \x3Cskill-dir>/scripts/concat_videos.py \
  --inputs \x3Cseg1.mp4> \x3Cseg2.mp4> ... \
  --output \x3Coutput-dir>/final_video.mp4

If the final file exceeds 25MB (Feishu upload limit), compress with ffmpeg:

ffmpeg -i \x3Cinput> -c:v libx264 -crf 32 -c:a aac -b:a 96k -vf "scale=1280:720" -y \x3Coutput>

Step 6: Deliver

Share the final video file with the user
For Feishu delivery: use feishu-send-file skill to send the .mp4 file
Final report:

🎬 **视频生成完成！**

⏱️ 总时长：{duration}秒
📦 文件大小：{size}MB
📊 共 {N} 段，总耗时 {total_minutes} 分钟

Prompt Tips

Use English prompts for best quality (translate Chinese descriptions)
Be specific: scene, camera angle, lighting, motion, atmosphere
Include style keywords: cinematic, realistic, cartoon, watercolor, etc.
For continuation segments, describe the action progression, not the full scene from scratch
Keep each segment prompt concise (1-3 sentences)

Parameters Reference

Parameter	Flag	Default	Options
Prompt	`--prompt`	(required)	Descriptive text
Quality	`--quality`	`quality`	`quality` / `speed`
Audio	`--audio`	`true`	`true` / `false`
Resolution	`--size`	`1920x1080`	`1280x720`, `1920x1080`, `3840x2160`
Frame rate	`--fps`	`30`	`30` / `60`
Output dir	`--output-dir`	`.`	Any writable path
Poll interval	`--poll-interval`	`10`	Seconds
Max wait	`--max-wait`	`900`	Seconds (default raised for reliability)

Error Handling

Missing ZHIPU_API_KEY: Ask user to set environment variable
Missing zai-sdk: pip install zai-sdk (under anaconda)
Missing ffmpeg: Required for frame extraction and concatenation
Task timeout: Increase --max-wait or retry; check task status manually via API
Task failed: Simplify the prompt and retry
File too large for Feishu: Compress with ffmpeg (reduce resolution or increase CRF)

Usage Guidance

This skill looks like a legitimate video-generation wrapper, but several important details are missing or inconsistent: (1) the code requires ZHIPU_API_KEY but the skill metadata does not declare any required credentials — confirm you must provide a Zhipu API key and only provide it to trusted skills/environments; (2) the scripts call ffmpeg and ffprobe and use a hard-coded Python interpreter (/opt/anaconda3/bin/python3) — ensure those binaries exist or edit the scripts to point to your environment; (3) any local image path you give will be read, base64-encoded, and uploaded to Zhipu's API (do not pass sensitive files); (4) the skill depends on a 'zai' Zhipu client library that is not declared — install and review that dependency before running. If you plan to use this skill, run it in a sandbox or isolated environment first, verify/patch the shebangs and paths, and ask the publisher to update the manifest to declare ZHIPU_API_KEY and required binaries/dependencies. If you cannot verify those points, treat the skill as untrusted.

Capability Analysis

Type: OpenClaw Skill Name: llm-video-generator Version: 1.0.1 The skill bundle provides a legitimate and well-structured workflow for generating videos using the ZhipuAI CogVideoX-3 API. It includes Python scripts for video generation (video_gen.py), frame extraction (extract_last_frame.py), and video concatenation (concat_videos.py), all of which use safe subprocess handling with argument lists to prevent shell injection. The instructions in SKILL.md are focused on task execution, progress reporting, and error handling, with no evidence of malicious intent, data exfiltration, or prompt injection attacks.

Capability Assessment

⚠ Purpose & Capability

The skill claims to be an instruction-only text/image->video generator, which is coherent with the provided scripts, but the declared registry metadata shows no required environment variables or binaries while the code clearly requires a ZHIPU_API_KEY, network access, and external tools (ffmpeg/ffprobe). The SKILL.md and manifest omit these required credentials/tools — an inconsistent declaration.

⚠ Instruction Scope

Runtime instructions are narrowly focused on generating segments, reporting progress, and concatenating segments. However, SKILL.md and the scripts mandate using /opt/anaconda3/bin/python3 (no fallback) and call ffmpeg/ffprobe; the instructions also instruct converting local image files to base64 and sending them to the Zhipu API. That means if the agent is given a local file path, its contents will be read and uploaded to a remote service — this is expected for image inputs but should be explicit in the manifest/instructions.

ℹ Install Mechanism

There is no install spec (instruction-only), which reduces install-time risk. The package includes runnable scripts but does not declare dependencies (Python package 'zai' / Zhipu client) or system binaries. No remote download/install steps are present in the manifest.

⚠ Credentials

The code requires a ZHIPU_API_KEY (checked at runtime) and network access to the Zhipu service, but the registry metadata lists no required environment variables or primary credential. This omission is disproportionate: an API key is a high-sensitivity secret and should be declared and justified. Additionally, local files passed as image inputs are read, base64-encoded, and sent to the remote API — users must not pass sensitive local file paths unless they understand the upload.

✓ Persistence & Privilege

The skill does not request persistent/always-on privileges, doesn't modify other skills or global agent configuration, and only writes task/result files in the configured output directory. Autonomous invocation is allowed (platform default) but not combined with excessive privileges.

Version History

v1.0.1

**This update introduces user progress notifications and time estimates for multi-segment video generation.** - Added required time estimation and user notification before starting video generation, with detailed guidelines. - Introduced progress messages after each segment completes, including completion count, segment description, elapsed and remaining time. - Increased default timeout for segment generation from 600s to 900s for improved reliability. - Included new steps for file size handling: recommend compressing final video if it exceeds messaging platform limits. - Improved instructions for user communication and error handling throughout the workflow.

v1.0.0

- Initial release of llm-video-generator skill. - Supports generating videos from text descriptions, images, or specified first/last frames using ZhipuAI CogVideoX-3. - Handles long videos (over 5 seconds) by chaining multiple generation calls with automatic last-frame continuation and video concatenation. - Allows configuration of video content, style, resolution (up to 4K), frame rate (30/60fps), audio, and duration. - Provides recommended workflow: clarify user request, plan segments for longer videos, execute generation, concatenate results, and deliver. - Includes error handling and prompt-writing tips for best results.

Metadata

Slug llm-video-generator

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is llm-video-generator?

Generate videos from text descriptions using ZhipuAI CogVideoX-3 model. Supports text-to-video, image-to-video, and first/last frame-to-video generation. Aut... It is an AI Agent Skill for Claude Code / OpenClaw, with 304 downloads so far.

How do I install llm-video-generator?

Run "/install llm-video-generator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is llm-video-generator free?

Yes, llm-video-generator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does llm-video-generator support?

llm-video-generator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created llm-video-generator?

It is built and maintained by baokui (@baokui); the current version is v1.0.1.

More Skills

llm-video-generator