Description

通过火山方舟Ark API调用Seedance 2.0生成第一人称带货短视频，v2 新增剧本驱动的配音（edge-tts/火山TTS）、背景音乐自动混音、字幕烧录，内置 8 套人设模板（传统女、时尚主播、老中医、厨房主妇、美妆博主、健身教练、户外探店、数码博主）。触发词：生成视频、带货视频、产品视频、拍视频、剧本...

README (SKILL.md)

火15 AI 带货视频生成 Skill v2

Name: Huo15 Influencer Video Skill
Author: zhaobod1

产品图 + 剧本 → 带配音、背景音乐、字幕的第一人称带货短视频。内置 8 套人设模板，一句话指定品类即可自动选模板。

⚠️ 安全规则

每次生成前必须告知用户预估费用（Seedance 视频部分；TTS edge-tts 免费、火山 TTS 单字 ≤¥0.0006）
用户确认后才可提交任务
单次最大时长 15 秒，默认按配音时长自动决定
优先使用最省 Token 的配置

一、能力总览

模块	v1	v2（本次）
视频生成	Seedance 2.0 ✅	Seedance 2.0 ✅
配音	❌	edge-tts（默认免费） / 火山 TTS（可选） ✅
背景音乐	❌	ffmpeg 自动循环+降音量+淡出 ✅
字幕	❌	按行字数比例切时间轴 + 烧录 ✅
人设模板	1 套（传统女）	8 套（按品类自动推荐） ✅
剧本驱动	❌	JSON 剧本一键端到端 ✅

二、文件结构

huo15-influencer-video-skill/
├── SKILL.md                           # 本文档
├── _meta.json
├── scripts/
│   ├── templates.py                   # 8 套人设模板配置
│   ├── tts.py                         # 配音引擎（edge-tts + 火山）
│   ├── bgm.py                         # BGM 库 + 混音 + 视频/音频合并
│   └── pipeline.py                    # 端到端 pipeline（推荐入口）
└── examples/
    ├── script_traditional_lady.json
    ├── script_fashion_host.json
    └── script_auto_template.json

三、依赖与凭证

# 必需
brew install ffmpeg
pip install edge-tts requests

# 视频生成
export ARK_API_KEY=ak-xxxxx                # 方舟控制台获取

# 可选：火山 TTS（不设则降级到 edge-tts，免费但音质稍弱）
export VOLC_TTS_APP_ID=xxxxx
export VOLC_TTS_TOKEN=xxxxx
export VOLC_TTS_CLUSTER=volcano_tts

# 可选：BGM 库目录（默认 ~/Music/huo15-bgm/）
export HUO15_BGM_DIR=~/Music/huo15-bgm

BGM 文件准备（一次性）

把 5 个免版税音乐放到 ~/Music/huo15-bgm/（缺哪个跳过哪个，不影响视频生成）：

文件名	风格	推荐用途	下载关键词
`warm.mp3`	温暖钢琴	养生/食品/手工	warm piano background
`energetic.mp3`	活力电子	美妆/服装/直播	upbeat electronic
`asian.mp3`	中国风古筝	中药/茶/古风	chinese guzheng
`soft.mp3`	柔和氛围	数码/护肤	soft ambient pad
`cinematic.mp3`	电影弦乐	户外/特产	cinematic strings

下载渠道：Pixabay Music（CC0）、Freesound、Incompetech（注明出处）。

四、8 套预设模板

key	角色	推荐音色	推荐 BGM	适用品类
`traditional_lady`	传统中年女性（默认）	晓秋（沉稳）	warm	养生 / 茶叶 / 手工 / 古法食品
`fashion_host`	时尚女主播	晓晓（活泼）	energetic	美妆 / 服装 / 饰品 / 数码配件
`tcm_doctor`	老中医	云健（沉稳男）	asian	中药 / 保健品 / 膏方 / 艾灸
`kitchen_mom`	厨房主妇	晓涵（温暖）	warm	调味料 / 食材 / 厨具 / 速食
`beauty_blogger`	美妆博主	晓梦（活泼）	soft	护肤 / 彩妆 / 香水 / 美容仪
`fitness_coach`	健身教练	云皓（激情男）	energetic	蛋白粉 / 运动器材 / 补剂
`outdoor_explorer`	户外探店达人	云夏（轻快男）	cinematic	地方特产 / 户外装备 / 民俗
`tech_geek`	数码博主	云扬（专业男）	soft	手机 / 耳机 / 智能家居 / 电脑

自动选模板

{ "template": "auto", "category": "蛋白粉", ... }

→ 命中 fitness_coach。无匹配回退 traditional_lady。

五、剧本格式

{
  "template": "traditional_lady",
  "image": "/path/to/product.jpg",
  "lines": [
    {"text": "姐妹们，今天给大家推一款好东西", "action": "举起产品给镜头"},
    {"text": "古法配方，纯手工制作",            "action": "微笑展示产品细节"},
    {"text": "用过的姐妹都说好",                "action": "点头肯定"}
  ],
  "bgm": "warm",
  "bgm_volume": 0.18,
  "subtitle": true,
  "voice_override": null,
  "rate_override": null,
  "output": "/tmp/huo15/final.mp4"
}

字段	必填	说明
`template`	✅	模板 key 或 `"auto"`（配合 `category`）
`category`	template=auto 时必填	品类关键词，自动选模板
`image`	✅	产品图本地路径
`lines`	✅	数组，每条 `{text, action}`。首条 action 会写进 Seedance prompt
`bgm`	❌	BGM key（warm/energetic/...）或绝对路径；null=无 BGM
`bgm_volume`	❌	0~~1，覆盖模板默认（0.18~~0.25 较合适）
`subtitle`	❌	默认 true；烧录字幕到视频
`voice_override`	❌	强制换音色，如 `"zh-CN-XiaoxiaoNeural"`
`rate_override`	❌	强制改语速，如 `"+10%"`
`output`	❌	成片路径

台词长度上限

整段配音必须 ≤ 14.5 秒（Seedance 单次最长 15s）。中文约 50~70 字。超长会抛错并提示精简，不强行截断。

六、调用方式

6.1 命令行（最直接）

cd huo15-influencer-video-skill

# 自检 — 第一次跑先做这一步
python3 scripts/pipeline.py preflight

# 列出 8 套人设模板 / 8 个推荐音色
python3 scripts/pipeline.py templates
python3 scripts/pipeline.py voices

# dry-run — 只跑 TTS + 字幕，不调 Seedance（省 ¥）
# 用于先验证剧本节奏、TTS 音色、字幕断句
python3 scripts/pipeline.py dry-run examples/script_traditional_lady.json

# 完整端到端
python3 scripts/pipeline.py render examples/script_traditional_lady.json

6.2 Python 调用

import sys, json
sys.path.insert(0, "scripts")
from pipeline import render

result = render({
    "template": "auto",
    "category": "蛋白粉",
    "image": "/path/to/protein.jpg",
    "lines": [
        {"text": "兄弟们练完这一组",   "action": "拿起产品"},
        {"text": "蛋白吸收率高得离谱", "action": "展示成分"},
    ],
    "subtitle": True,
})
print(result)
# {'output': '/tmp/huo15_video/final.mp4', 'template': 'fitness_coach',
#  'voice_duration': 6.2, 'video_duration': 8, 'tokens': 172800,
#  'cost_yuan': 7.95, 'size_mb': 3.1}

6.3 仅生成无声视频（兼容 v1 用法）

需要老接口的话，从 pipeline._generate_silent_video 直接调，跳过 TTS/BGM。

6.4 Dry-run（强烈建议先跑）

result = render(script, dry_run=True)
# 返回：voice_path / srt_path / tokens 预估 / cost_yuan / prompt
# 不调 Seedance，不计费；TTS 是免费的

用途：调剧本节奏、试音色、看字幕断行 —— 全部确认满意再跑 render(dry_run=False)。 Agent 应该在用户首次给剧本时默认先做 dry-run，把生成的 voice.mp3 路径告诉用户试听。

七、Agent 工作流

当用户说"用这张图按这个剧本拍带货视频"时，按下面的步骤走：

收集要素
- 产品图路径
- 剧本（多句台词）；如未给可主动起草
- 品类关键词（用于自动选模板）
选模板
- 用户没明说就调 templates.suggest_template(category)
- 给用户看一眼"我准备用 XX 模板（XX 角色 + XX 音色 + XX BGM）"
预算确认
- 算配音预估时长（80字 ≈ 9~10s）
- 算视频费用（estimate_cost），告知用户
执行
- render(script) 端到端
交付
- 文件路径、实际时长、Seedance tokens、¥ 费用

对话示例

用户: 用 product.jpg 拍个卖蛋白粉的，3 句台词，自己想词
Agent: 我帮您起草剧本，按健身教练模板（云皓男声 + energetic BGM）：
        1) 兄弟们，练完这一组
        2) 蛋白吸收率高得离谱
        3) 练大一年，从这罐开始
       预估视频时长 8s ≈ ¥7.95，配音免费。确认生成？
用户: 行
Agent: [render] → /tmp/huo15/protein.mp4 (3.1MB)，实际 ¥7.95

八、Seedance API 速查（保留 v1）

项	值
模型	`doubao-seedance-2-0-260128`
端点	`https://ark.cn-beijing.volces.com/api/v3`
计费	`Token = 秒 × 720 × 1280 × 24 / 1024`，`¥ = Token × 46 / 1e6`
比例	`9:16` 竖屏带货
时长	4 ~ 15 秒

时长	Token	费用
4s	~86,400	¥3.97
5s	~108,000	¥4.97
10s	~216,000	¥9.94
15s	~324,000	¥14.90

content 中的 role：

role	用途
`reference_image`	默认；AI 参考产品外观
`first_frame`	视频必须从这张图开始
`last_frame`	指定结束画面
`reference_video`	运动风格参考
`reference_audio`	音频风格参考

九、混音参数（bgm.py）

voice 主轨 ──┐
              ├─ amix(duration=first) ─ afade(out, 0.5s) ─→ mixed.mp3
BGM 副轨 ────┘    （BGM aloop 到与 voice 等长，volume=bgm_volume）

bgm_volume 默认 0.18~0.25。模板已按角色调过，一般不用改。
voice_volume 默认 1.0；想突出 BGM 可降到 0.85。
末尾统一 0.5 秒淡出，避免硬切。

成片合成：mux_video_audio(silent.mp4, mixed.mp3, final.mp4) — -c:v copy -c:a aac -shortest，无视频重编码，秒出。

十、字幕（pipeline._build_srt + _burn_subtitles）

按每行台词字数比例切时间轴生成 SRT
用 ffmpeg subtitles 滤镜烧录到视频
默认字体 PingFang SC、字号 14、白字黑边、底部 80px
想关字幕：剧本里 "subtitle": false

字体不可用时报 Fontconfig error，换 font: "Heiti SC" 或装 brew install --cask font-noto-sans-cjk-sc。

十一、降级策略（哪步失败也别全废）

失败	降级行为
火山 TTS 凭证缺失	自动用 edge-tts
edge-tts 也挂	抛错，提示 `pip install edge-tts` 并检查网络
BGM 文件找不到	跳过 BGM，仅人声 + 0.5s 淡出
字体找不到	`subtitle: false` 重跑，或装中文字体
Seedance 超时	默认 20 分钟轮询，超时抛 TimeoutError

十二、常见问题

Q: 配音不准 / 多音字读错怎么办？

A: 在 lines.text 里用谐音字。edge-tts 不支持 SSML phoneme tag。

Q: BGM 太响盖过人声？

A: 剧本里 "bgm_volume": 0.12 进一步压低，或换 soft.mp3。

Q: 想纯人声没 BGM？

A: 剧本里 "bgm": null。

Q: 想要长视频（>15 秒）？

A: 当前版本不支持。建议拆段：每段 ≤15s 单独生成，再用 ffmpeg concat 拼接。

Q: 视频画面和剧本对不上？

A: 把第一句的 action 写得更具体（"举起产品贴近镜头"比"展示"准）。剧本中段的 action 仅用于字幕节奏参考，不会写进 Seedance prompt。

Q: 想保留 v1 的纯视频生成？

A: 直接调 pipeline._generate_silent_video(image, prompt, duration, output)。

十三、版本历史

v2.0.0（2026-04-27）— 剧本驱动 + 配音 + BGM + 字幕 + 8 套模板 + dry-run + preflight
v1.2.x — 单一传统女模板 + 无声视频

Usage Guidance

This skill appears to implement what it claims — a local pipeline that TTSs text, calls Seedance (Ark) to generate a silent video from a provided image, mixes BGM with voice via ffmpeg, and burns subtitles. Before installing/using: 1) Be aware that you must provide ARK_API_KEY (mandatory) and optionally Volc TTS credentials; the registry metadata omitted these, so set them manually if you want full functionality. 2) Any product image path you supply will be base64-encoded and uploaded to Ark's API (https://ark.cn-beijing.volces.com); don't upload sensitive images. 3) The skill runs ffmpeg/ffprobe and subprocesses locally — review and run in an environment you control; run the provided dry-run first to audit generated voice/srt without calling Seedance. 4) Check the Volc TTS setup/credentials if you plan to enable it (the tts_volc implementation sets an Authorization header with a semicolon which looks unusual and may be a bug — verify before sending real tokens). 5) Prefer to use API keys with least privilege and monitor usage (the code prints token/¥ estimates). If you need the skill to be accepted by an automated policy, ask the publisher to update registry metadata to declare ARK_API_KEY and optional Volc vars explicitly so deployment systems can surface the requirement.

Capability Analysis

Type: OpenClaw Skill Name: huo15-influencer-video-skill Version: 2.0.0 The skill bundle is a legitimate tool for generating AI influencer videos using the Seedance 2.0 API and TTS engines. It includes well-structured Python scripts (pipeline.py, tts.py, bgm.py) that use ffmpeg for media processing and requests for API interaction, following standard practices such as using argument lists in subprocess calls to prevent shell injection. The SKILL.md file provides clear operational instructions and safety rules, including mandatory cost estimation and user confirmation before execution, with no evidence of malicious prompt injection or unauthorized data exfiltration.

Capability Tags

requires-sensitive-credentials

Capability Assessment

⚠ Purpose & Capability

The skill's name, description, SKILL.md and included scripts consistently implement Seedance (Ark) video generation, edge-tts/火山 TTS, ffmpeg mixing, and local BGM handling — these requirements are coherent with the stated purpose. However, the registry metadata claims no required environment variables or primary credential while SKILL.md and the code clearly require ARK_API_KEY (mandatory) and optionally VOLC_TTS_APP_ID / VOLC_TTS_TOKEN for Volc TTS. That mismatch is an inconsistency in declared capabilities/requirements.

ℹ Instruction Scope

Runtime instructions and code stay within the stated scope: they require a product image path, build a prompt, call Seedance API (uploads base64 image), synthesize TTS (edge-tts locally or Volc remote), mix BGM and burn subtitles via ffmpeg. This means user-supplied local files (image, optional BGM dir) will be read and the image will be transmitted to external endpoints (Ark and optionally Volc). That behaviour is expected but important to note: any image you supply is uploaded to remote services.

✓ Install Mechanism

No install spec is included (instruction-only install). The SKILL.md asks to install ffmpeg via brew and pip packages (edge-tts, requests). This is proportionate to the task; nothing is downloaded from unknown URLs or extracted to disk by an automated installer. Risk is typical for running third‑party Python scripts that invoke ffmpeg and make network requests.

⚠ Credentials

The environment variables the code uses (ARK_API_KEY mandatory; optional VOLC_TTS_APP_ID / VOLC_TTS_TOKEN / VOLC_TTS_CLUSTER; HUO15_BGM_DIR defaulting to ~/Music/huo15-bgm) are reasonable and proportional to the skill's features. The concern is that the skill registry metadata did not declare these required/env vars, so a user or deployment policy that relies on registry declarations may miss the need to provide API keys. No unrelated credentials are requested.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills, and only writes outputs under /tmp or user-specified paths. It does not require persistent system-wide privileges. It does read files from user-supplied paths (image, optional BGM directory) and will write temporary and output files locally.

Version History

v2.0.0

v2 大版本: 剧本驱动 + edge-tts/火山 TTS 配音 + ffmpeg 自动混 BGM + 字幕烧录 + 8 套人设模板(传统女/时尚主播/老中医/厨房主妇/美妆博主/健身教练/户外探店/数码博主, 按品类自动选) + dry-run 省钱测剧本 + preflight 自检 + CLI 子命令

v1.2.2

No file changes detected in version 1.2.2. - No updates or modifications were made in this release. - All features and documentation remain unchanged from version 1.2.1.

v1.2.1

v1.2.1 把本地工作态同步到 clawhub（之前本地版本号落后于 clawhub）

v1.2.0

安全修复：API Key全部改为环境变量

v1.1.0

脱敏修复：API Key改为环境变量引用

v1.0.0

火15 AI 带货视频生成 Skill 初始发布： - 支持通过火山方舟 Ark API 调用 Seedance 2.0，将产品图一键生成带货短视频。 - 内置全流程安全与费用告知规则，自动计算生成费用并需用户确认。 - 自定义/自动化提示词拼接逻辑，支持灵活调整人物、动作、环境等要素。 - 提供标准 API 调用范例（含图片、音视频参考），参数配置清晰明确。 - 附赠完整 Python 工具函数，助力一键生成视频与自助集成。

Metadata

Slug huo15-influencer-video-skill

Version 2.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 6

Frequently Asked Questions

What is Huo15 Influencer Video Skill?

通过火山方舟Ark API调用Seedance 2.0生成第一人称带货短视频，v2 新增剧本驱动的配音（edge-tts/火山TTS）、背景音乐自动混音、字幕烧录，内置 8 套人设模板（传统女、时尚主播、老中医、厨房主妇、美妆博主、健身教练、户外探店、数码博主）。触发词：生成视频、带货视频、产品视频、拍视频、剧本... It is an AI Agent Skill for Claude Code / OpenClaw, with 211 downloads so far.

How do I install Huo15 Influencer Video Skill?

Run "/install huo15-influencer-video-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Huo15 Influencer Video Skill free?

Yes, Huo15 Influencer Video Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Huo15 Influencer Video Skill support?

Huo15 Influencer Video Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Huo15 Influencer Video Skill?

It is built and maintained by Job Zhao (@zhaobod1); the current version is v2.0.0.

More Skills

Huo15 Influencer Video Skill