← 返回 Skills 市场

Conversation Video

Name: Conversation Video
Author: pratyushchauhan

作者 Pratyush Chauhan · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install conversation-video

功能描述

Generate animated conversation videos with multi-voice TTS audio and timed text overlays. Use when the user needs to (1) turn a transcript or dialogue into a...

使用说明 (SKILL.md)

Conversation Video

Generate multi-voice conversation videos from text transcripts. Two paths: quick ffmpeg (no dependencies) or rich Remotion (React animations).

Prerequisites

Tool	Path / Notes
ffmpeg	System install or Jellyfin ffmpeg at `/usr/lib/jellyfin-ffmpeg/ffmpeg`
supertonic-tts	Python package for multi-voice TTS (see scripts/generate_audio.py for load logic)
Node.js + npm	Only needed for Remotion path

Workflow

1. Build a transcript manifest

Create a JSON file with your conversation:

[
  {"speaker": "NARRATOR",   "text": "Customer Discovery Interview", "voice": "M1", "speed": 1.0, "align": "center"},
  {"speaker": "INTERVIEWER","text": "Walk me through when you first realized...", "voice": "M5", "speed": 0.95, "align": "left"},
  {"speaker": "CUSTOMER",   "text": "I was looking for a marketer agent.", "voice": "M2", "speed": 1.0, "align": "right"}
]

Fields: speaker (label), text (spoken text), voice (supertonic voice name e.g. M1-M5, F1-F2), speed (optional playback speed), align (left/right/center for video placement).

2. Generate audio + timing manifest

python scripts/generate_audio.py manifest.json output.wav

Outputs:

output.wav — concatenated multi-voice audio
output_timings.json — per-segment start/end times for video sync

3. Render video (choose path)

Path A: ffmpeg — fast, no Node.js needed

python scripts/ffmpeg_render.py output_timings.json output.wav video.mp4

Options: --width, --height, --font-size, --bg, --font, --crf

Path B: Remotion — richer animations, React-based

Copy the boilerplate:

cp -r assets/remotion-boilerplate ./my-video
cd my-video
npm install

Edit src/Conversation.tsx:

Replace conversation array with your lines (duration in frames, 30fps)
Set SpeakerConfig colors/alignment
Uncomment \x3CAudio src={staticFile("audio.wav")} /> and place audio in public/

Render:

npx remotion render src/index.ts Conversation out/video.mp4

Speaker Customization

Default color/alignment map (edit in either ffmpeg or Remotion):

Speaker	Color	Align
NARRATOR	#cbd5e1	center
INTERVIEWER	#60a5fa	left
CUSTOMER	#34d399	right

Add more by extending the config map in the respective renderer.

Resources

scripts/generate_audio.py — Multi-voice TTS with timing export
scripts/ffmpeg_render.py — ffmpeg drawtext video renderer
assets/remotion-boilerplate/ — Copyable Remotion project template
references/remotion-patterns.md — Advanced Remotion techniques (JSON data loading, word-by-word reveal, audio sync)
references/ffmpeg-guide.md — ffmpeg drawtext syntax and timing reference

安全使用建议

Install only if you are comfortable running local media-generation commands and npm install for the optional Remotion template. Review transcript contents before use, because generated audio, timing JSON, terminal logs, and temporary WAV files may contain the spoken text.

能力评估

✓ Purpose & Capability

The artifacts consistently support the stated purpose: generating conversation videos from transcript manifests using TTS, ffmpeg, and optional Remotion animation templates.

✓ Instruction Scope

The runtime steps are explicit and user-directed; the skill shows concrete commands for generating audio, rendering video, copying a boilerplate project, and optionally rendering with Remotion.

ℹ Install Mechanism

The Remotion path requires npm install for declared public packages, and the TTS script depends on supertonic-tts with possible model download behavior; these are disclosed as prerequisites and fit the purpose.

ℹ Credentials

The skill uses local Python scripts, ffmpeg subprocesses, temporary WAV files, and output media files, which are proportionate for video rendering but can process potentially sensitive transcript content locally.

ℹ Persistence & Privilege

No background persistence, credential access, privilege escalation, or hidden startup behavior was found; generated audio/video outputs and temporary audio work files are expected side effects.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install conversation-video
安装完成后，直接呼叫该 Skill 的名称或使用 /conversation-video 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release: multi-voice TTS audio + timed text overlay video via ffmpeg or Remotion

元数据

Slug conversation-video

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题