← Back to Skills Marketplace
pratyushchauhan

Conversation Video

by Pratyush Chauhan · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
37
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install conversation-video
Description
Generate animated conversation videos with multi-voice TTS audio and timed text overlays. Use when the user needs to (1) turn a transcript or dialogue into a...
README (SKILL.md)

Conversation Video

Generate multi-voice conversation videos from text transcripts. Two paths: quick ffmpeg (no dependencies) or rich Remotion (React animations).

Prerequisites

Tool Path / Notes
ffmpeg System install or Jellyfin ffmpeg at /usr/lib/jellyfin-ffmpeg/ffmpeg
supertonic-tts Python package for multi-voice TTS (see scripts/generate_audio.py for load logic)
Node.js + npm Only needed for Remotion path

Workflow

1. Build a transcript manifest

Create a JSON file with your conversation:

[
  {"speaker": "NARRATOR",   "text": "Customer Discovery Interview", "voice": "M1", "speed": 1.0, "align": "center"},
  {"speaker": "INTERVIEWER","text": "Walk me through when you first realized...", "voice": "M5", "speed": 0.95, "align": "left"},
  {"speaker": "CUSTOMER",   "text": "I was looking for a marketer agent.", "voice": "M2", "speed": 1.0, "align": "right"}
]

Fields: speaker (label), text (spoken text), voice (supertonic voice name e.g. M1-M5, F1-F2), speed (optional playback speed), align (left/right/center for video placement).

2. Generate audio + timing manifest

python scripts/generate_audio.py manifest.json output.wav

Outputs:

  • output.wav — concatenated multi-voice audio
  • output_timings.json — per-segment start/end times for video sync

3. Render video (choose path)

Path A: ffmpeg — fast, no Node.js needed

python scripts/ffmpeg_render.py output_timings.json output.wav video.mp4

Options: --width, --height, --font-size, --bg, --font, --crf

Path B: Remotion — richer animations, React-based

Copy the boilerplate:

cp -r assets/remotion-boilerplate ./my-video
cd my-video
npm install

Edit src/Conversation.tsx:

  1. Replace conversation array with your lines (duration in frames, 30fps)
  2. Set SpeakerConfig colors/alignment
  3. Uncomment \x3CAudio src={staticFile("audio.wav")} /> and place audio in public/

Render:

npx remotion render src/index.ts Conversation out/video.mp4

Speaker Customization

Default color/alignment map (edit in either ffmpeg or Remotion):

Speaker Color Align
NARRATOR #cbd5e1 center
INTERVIEWER #60a5fa left
CUSTOMER #34d399 right

Add more by extending the config map in the respective renderer.

Resources

  • scripts/generate_audio.py — Multi-voice TTS with timing export
  • scripts/ffmpeg_render.py — ffmpeg drawtext video renderer
  • assets/remotion-boilerplate/ — Copyable Remotion project template
  • references/remotion-patterns.md — Advanced Remotion techniques (JSON data loading, word-by-word reveal, audio sync)
  • references/ffmpeg-guide.md — ffmpeg drawtext syntax and timing reference
Usage Guidance
Install only if you are comfortable running local media-generation commands and npm install for the optional Remotion template. Review transcript contents before use, because generated audio, timing JSON, terminal logs, and temporary WAV files may contain the spoken text.
Capability Assessment
Purpose & Capability
The artifacts consistently support the stated purpose: generating conversation videos from transcript manifests using TTS, ffmpeg, and optional Remotion animation templates.
Instruction Scope
The runtime steps are explicit and user-directed; the skill shows concrete commands for generating audio, rendering video, copying a boilerplate project, and optionally rendering with Remotion.
Install Mechanism
The Remotion path requires npm install for declared public packages, and the TTS script depends on supertonic-tts with possible model download behavior; these are disclosed as prerequisites and fit the purpose.
Credentials
The skill uses local Python scripts, ffmpeg subprocesses, temporary WAV files, and output media files, which are proportionate for video rendering but can process potentially sensitive transcript content locally.
Persistence & Privilege
No background persistence, credential access, privilege escalation, or hidden startup behavior was found; generated audio/video outputs and temporary audio work files are expected side effects.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install conversation-video
  3. After installation, invoke the skill by name or use /conversation-video
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: multi-voice TTS audio + timed text overlay video via ffmpeg or Remotion
Metadata
Slug conversation-video
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Conversation Video?

Generate animated conversation videos with multi-voice TTS audio and timed text overlays. Use when the user needs to (1) turn a transcript or dialogue into a... It is an AI Agent Skill for Claude Code / OpenClaw, with 37 downloads so far.

How do I install Conversation Video?

Run "/install conversation-video" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Conversation Video free?

Yes, Conversation Video is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Conversation Video support?

Conversation Video is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Conversation Video?

It is built and maintained by Pratyush Chauhan (@pratyushchauhan); the current version is v1.0.0.

💬 Comments