← 返回 Skills 市场
pratyushchauhan

Conversation Video

作者 Pratyush Chauhan · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
37
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install conversation-video
功能描述
Generate animated conversation videos with multi-voice TTS audio and timed text overlays. Use when the user needs to (1) turn a transcript or dialogue into a...
使用说明 (SKILL.md)

Conversation Video

Generate multi-voice conversation videos from text transcripts. Two paths: quick ffmpeg (no dependencies) or rich Remotion (React animations).

Prerequisites

Tool Path / Notes
ffmpeg System install or Jellyfin ffmpeg at /usr/lib/jellyfin-ffmpeg/ffmpeg
supertonic-tts Python package for multi-voice TTS (see scripts/generate_audio.py for load logic)
Node.js + npm Only needed for Remotion path

Workflow

1. Build a transcript manifest

Create a JSON file with your conversation:

[
  {"speaker": "NARRATOR",   "text": "Customer Discovery Interview", "voice": "M1", "speed": 1.0, "align": "center"},
  {"speaker": "INTERVIEWER","text": "Walk me through when you first realized...", "voice": "M5", "speed": 0.95, "align": "left"},
  {"speaker": "CUSTOMER",   "text": "I was looking for a marketer agent.", "voice": "M2", "speed": 1.0, "align": "right"}
]

Fields: speaker (label), text (spoken text), voice (supertonic voice name e.g. M1-M5, F1-F2), speed (optional playback speed), align (left/right/center for video placement).

2. Generate audio + timing manifest

python scripts/generate_audio.py manifest.json output.wav

Outputs:

  • output.wav — concatenated multi-voice audio
  • output_timings.json — per-segment start/end times for video sync

3. Render video (choose path)

Path A: ffmpeg — fast, no Node.js needed

python scripts/ffmpeg_render.py output_timings.json output.wav video.mp4

Options: --width, --height, --font-size, --bg, --font, --crf

Path B: Remotion — richer animations, React-based

Copy the boilerplate:

cp -r assets/remotion-boilerplate ./my-video
cd my-video
npm install

Edit src/Conversation.tsx:

  1. Replace conversation array with your lines (duration in frames, 30fps)
  2. Set SpeakerConfig colors/alignment
  3. Uncomment \x3CAudio src={staticFile("audio.wav")} /> and place audio in public/

Render:

npx remotion render src/index.ts Conversation out/video.mp4

Speaker Customization

Default color/alignment map (edit in either ffmpeg or Remotion):

Speaker Color Align
NARRATOR #cbd5e1 center
INTERVIEWER #60a5fa left
CUSTOMER #34d399 right

Add more by extending the config map in the respective renderer.

Resources

  • scripts/generate_audio.py — Multi-voice TTS with timing export
  • scripts/ffmpeg_render.py — ffmpeg drawtext video renderer
  • assets/remotion-boilerplate/ — Copyable Remotion project template
  • references/remotion-patterns.md — Advanced Remotion techniques (JSON data loading, word-by-word reveal, audio sync)
  • references/ffmpeg-guide.md — ffmpeg drawtext syntax and timing reference
安全使用建议
Install only if you are comfortable running local media-generation commands and npm install for the optional Remotion template. Review transcript contents before use, because generated audio, timing JSON, terminal logs, and temporary WAV files may contain the spoken text.
能力评估
Purpose & Capability
The artifacts consistently support the stated purpose: generating conversation videos from transcript manifests using TTS, ffmpeg, and optional Remotion animation templates.
Instruction Scope
The runtime steps are explicit and user-directed; the skill shows concrete commands for generating audio, rendering video, copying a boilerplate project, and optionally rendering with Remotion.
Install Mechanism
The Remotion path requires npm install for declared public packages, and the TTS script depends on supertonic-tts with possible model download behavior; these are disclosed as prerequisites and fit the purpose.
Credentials
The skill uses local Python scripts, ffmpeg subprocesses, temporary WAV files, and output media files, which are proportionate for video rendering but can process potentially sensitive transcript content locally.
Persistence & Privilege
No background persistence, credential access, privilege escalation, or hidden startup behavior was found; generated audio/video outputs and temporary audio work files are expected side effects.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install conversation-video
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /conversation-video 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: multi-voice TTS audio + timed text overlay video via ffmpeg or Remotion
元数据
Slug conversation-video
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Conversation Video 是什么?

Generate animated conversation videos with multi-voice TTS audio and timed text overlays. Use when the user needs to (1) turn a transcript or dialogue into a... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 37 次。

如何安装 Conversation Video?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install conversation-video」即可一键安装,无需额外配置。

Conversation Video 是免费的吗?

是的,Conversation Video 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Conversation Video 支持哪些平台?

Conversation Video 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Conversation Video?

由 Pratyush Chauhan(@pratyushchauhan)开发并维护,当前版本 v1.0.0。

💬 留言讨论