功能描述

国内可用的AI视频生成技能。Create videos from text — generates script, images, voiceover, and merges into MP4. No time limit, full control. 视频生成、短视频制作。

使用说明 (SKILL.md)

国内 AI 视频生成 China Video Gen

Name: china-video-gen
Author: tobewin

将文字描述转化为完整视频：自动生成分镜脚本 → 图片序列 → 配音 → 合成 MP4。无时长限制，完全可控，国内直连，无需翻墙。

触发时机

"帮我做一个30秒的[产品]宣传视频"
"生成一个介绍[主题]的短视频"
"做一个[品牌]的广告视频"
"把这段文字做成视频"
"生成适合小红书/抖音发布的视频"

Step 0：环境检查

每次执行前必须先检查依赖，缺失则提示用户手动安装。

检查 ffmpeg

检查 ffmpeg 是否已安装：
- macOS:   brew install ffmpeg
- Ubuntu:  sudo apt install ffmpeg
- Windows: 从 https://ffmpeg.org/download.html 下载

检查依赖 Skills

需要安装以下 skills：
- china-image-gen：文生图技能
- china-tts：文字转语音技能

安装方法：clawhub install china-image-gen

检查 API Key

需要配置 SILICONFLOW_API_KEY：
1. 访问 cloud.siliconflow.cn 注册
2. 进入「API密钥」页面创建 Key
3. export SILICONFLOW_API_KEY='sk-xxxxxxxx'

Step 1：理解用户需求

从用户描述中提取关键信息：

视频主题：产品宣传 / 知识科普 / 品牌故事 / 教程演示 / 其他
目标时长：15秒 / 30秒 / 60秒 / 更长（无限制）
画面风格：写实 / 插画 / 科技感 / 温暖 / 商务
音色选择：见 china-tts 音色列表
目标平台：小红书(1:1或3:4) / 抖音(9:16) / B站/YouTube(16:9) / 通用(16:9)
语言：中文 / 英文 / 中英混合

Step 2：生成分镜脚本

根据用户需求，设计分镜脚本。每个分镜包含：

分镜N：
  时长：X 秒
  画面描述（英文 prompt，用于 FLUX 文生图）
  解说词（中文，用于 TTS 配音）
  运镜效果：静止 / Ken Burns 缩放 / 平移
  转场效果：淡入淡出 / 擦除 / 无

时长分配原则

总时长 30秒，建议分镜数量：5-8个
  开场：2-3秒（Logo/主题/吸引眼球）
  主体：每个分镜3-5秒
  结尾：2-3秒（CTA/联系方式/品牌）

总时长 60秒，建议分镜数量：10-15个
  节奏：前10秒最关键，必须抓住注意力

字数与时长对照（TTS朗读速度约4字/秒）：
  3秒 ≈ 12字
  5秒 ≈ 20字
  10秒 ≈ 40字

Step 3：生成图片序列

调用 china-image-gen skill，为每个分镜生成对应图片。

分辨率与比例

小红书(1:1)：1024x1024
小红书(3:4)：768x1024
抖音/竖版(9:16)：720x1280
B站/横版(16:9)：1280x720

图片生成

对每个分镜执行：

使用 china-image-gen 生成图片
保存到工作区 frames 目录
图片 URL 有效期1小时，必须立即下载

Step 4：生成配音音频

调用 china-tts skill，将所有解说词合并为一个音频文件。

合并所有分镜解说词
调用 TTS 生成 MP3
保存到工作区 audio 目录

Step 5：合成视频

使用 ffmpeg 将图片序列和音频合成为 MP4 视频。

方案A：简单合成（静止图片+音频）

使用 ffmpeg concat 功能
每张图片显示指定时长
合并音频

方案B：Ken Burns 效果（推荐）

为每张图片添加缓慢缩放效果
模拟镜头推进
更有质感

方案C：淡入淡出转场

两张图片之间添加淡入淡出
使用 xfade filter

Step 6：输出结果

视频生成完成
━━━━━━━━━━━━━━━━━━━━
视频文件：{工作区}/output.mp4
总时长：约 XX 秒
分镜数：X 张
画面比例：16:9（1280x720）

文件结构：
  video_xxx/
  ├── output.mp4          ← 最终视频
  ├── frames/             ← 各分镜图片
  ├── audio/
  │   └── voiceover.mp3  ← 配音文件
  └── concat.txt          ← 合成配置

视频类型预设

产品宣传（30秒，16:9）

分镜数：6个
图片模型：FLUX.1-dev（高质量）
音色：alex（沉稳男声）或 claire（温柔女声）
效果：Ken Burns
转场：淡入淡出

知识科普（60秒，16:9）

分镜数：12个
图片模型：FLUX.1-schnell（快速）
音色：anna（沉稳女声）
效果：静止图片
转场：无

小红书竖版（30秒，3:4）

分辨率：768x1024
分镜数：6个
图片模型：Kolors（中文理解最好）
音色：diana（欢快女声）
效果：Ken Burns

抖音竖版（15秒，9:16）

分辨率：720x1280
分镜数：4个（节奏快）
图片模型：FLUX.1-schnell
音色：bella（激情女声）
效果：Ken Burns

注意事项

图片 URL 有效期仅1小时，生成后立即下载
Ken Burns 效果处理较慢，每张图约需10-30秒
视频文件保存至 OpenClaw 工作区，长期保留
建议先用 FLUX.1-schnell 快速预览，满意后换 FLUX.1-dev 出高质量版
不要在短时间内大批量请求，避免触发 API 限速

安全使用建议

This skill appears to do what it says: it uses a single SiliconFlow API key to generate images and TTS and uses ffmpeg locally to compose the MP4. Before installing or using it: 1) Confirm you trust cloud.siliconflow.cn and are comfortable giving it the SILICONFLOW_API_KEY; 2) Install ffmpeg locally (the SKILL.md requires it, but metadata omitted it) and ensure python3 is actually needed — ask the author why python3 is declared; 3) Be cautious about storing API keys in ~/.openclaw/.env (plain-text); prefer ephemeral environment variables if you are concerned about leakage; 4) Note that installing china-image-gen and china-tts may bring additional requirements/permissions — review those skills too; 5) If you need higher assurance, ask the publisher for a brief justification of the python3 requirement and a declared dependency on ffmpeg in metadata.

功能分析

Type: OpenClaw Skill Name: china-video-gen Version: 1.1.0 The china-video-gen skill is a legitimate orchestrator designed to automate video production by coordinating storyboard creation, image generation, and text-to-speech via external skills and local ffmpeg execution. The instructions in SKILL.md and the technical references in references/ffmpeg.md provide standard command-line templates for video synthesis and do not contain evidence of data exfiltration, unauthorized access, or malicious prompt injection.

能力标签

cryptocan-make-purchasesrequires-sensitive-credentials

能力评估

ℹ Purpose & Capability

The skill claims to generate images, TTS, and merge into MP4 using a single SILICONFLOW_API_KEY which matches the described use of a single provider (cloud.siliconflow.cn). It also depends on two related OpenClaw skills (china-image-gen and china-tts) which is coherent. Minor inconsistency: the metadata declares python3 as a required binary, but the runtime instructions emphasise ffmpeg (required locally) and do not describe why python3 is mandatory. ffmpeg is referenced throughout but is not declared in the skill metadata.

✓ Instruction Scope

SKILL.md gives concrete step-by-step instructions limited to: extracting requirements from user text, calling china-image-gen and china-tts, downloading generated image URLs, running ffmpeg locally to compose video, and saving output to the OpenClaw work directory. It does not instruct reading unrelated local files or exfiltrating data to unexpected endpoints. It does recommend persisting the API key in ~/.openclaw/.env, which has privacy implications (see environment_proportionality).

✓ Install Mechanism

This is an instruction-only skill (no install spec, no code files). That minimizes direct install risk. It instructs the user/agent to run clawhub install for two related skills and to install ffmpeg manually; these are expected and are not inherently risky. No external arbitrary download URLs or archives are embedded in the skill files.

ℹ Credentials

Only one credential is required (SILICONFLOW_API_KEY), and that key is used for the provider (image + TTS) described in the skill — this is proportionate. Caveat: the skill suggests writing the API key into ~/.openclaw/.env (plain-text environment file) for permanence, which increases the risk of credential leakage if the filesystem or backups are accessible. Also verify that the china-image-gen and china-tts skills do not request additional unrelated credentials when installed.

✓ Persistence & Privilege

always is false and the skill does not request system-wide config changes or other skills' credentials. It does suggest persisting its own API key in the agent's env file, which is normal for ease-of-use but should be considered a privacy decision by the user.

版本历史

v1.1.0

v1.1.0: Security hardening - removed curl dependency, removed all code examples, pure instruction-based skill.

v1.0.1

修复安全扫描问题：1.元数据格式改为单行JSON并声明SILICONFLOW_API_KEY 2.移除curl|bash和sudo自动安装逻辑，改为提示手动安装

v1.0.0

china-video-gen v1.0.0 - Initial release: Generate complete videos from text descriptions domestically, with no time limit and full control over pacing and content. - Supports automatic dependency check and install (ffmpeg, china-image-gen, china-tts). - Extracts user needs to create storyboard scripts, generates images for each shot, and uses TTS for voiceovers. - Combines images and audio into MP4 output via ffmpeg. - Works without VPN, optimized for domestic networks. - Provides clear step-by-step instructions and sample scripts for dependencies, storyboard, and video generation flow.

元数据

Slug china-video-gen

版本 1.1.0

许可证 MIT-0

累计安装 1

当前安装数 0

历史版本数 3

常见问题

china-video-gen 是什么？

国内可用的AI视频生成技能。Create videos from text — generates script, images, voiceover, and merges into MP4. No time limit, full control. 视频生成、短视频制作。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 337 次。

如何安装 china-video-gen？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install china-video-gen」即可一键安装，无需额外配置。

china-video-gen 是免费的吗？

是的，china-video-gen 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

china-video-gen 支持哪些平台？

china-video-gen 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 china-video-gen？

由 ToBeWin（@tobewin）开发并维护，当前版本 v1.1.0。

china-video-gen