Ai Media
/install ai-media
ai-media - AI Media Generation
Full-stack AI media generation powered by GPU server (RTX 3090/3080/2070S).
Capabilities
- Image Generation — Photorealistic images via ComfyUI (z-image, Juggernaut XL)
- Video Generation — Video synthesis via ComfyUI (AnimateDiff, LTX-2)
- Talking Heads — Animated talking faces via SadTalker
- Voice Synthesis — Natural TTS via Voxtral (whisper.cpp)
GPU Server
- Host:
${GPU_USER}@${GPU_HOST} - SSH Key:
~/.ssh/id_ed25519_gpu - ComfyUI:
/data/ai-stack/comfyui/ComfyUI/(port 8188) - SadTalker:
/data/ai-stack/sadtalker/ - Voxtral:
/data/ai-stack/whisper/ - Output:
/data/ai-stack/output/
Usage
Generate Image
./scripts/image.sh "lady on beach at sunset" realistic
./scripts/image.sh "cyberpunk cityscape" artistic
Arguments:
$1: Prompt text$2: Style (realistic|artistic) — optional, default: realistic
Output: Path to generated image (e.g., /data/ai-stack/output/image_001.png)
Generate Video
./scripts/video.sh "waves crashing on shore" animatediff 4
./scripts/video.sh "city traffic timelapse" ltx2 8
Arguments:
$1: Prompt text$2: Model (animatediff|ltx2) — optional, default: animatediff$3: Duration in seconds — optional, default: 4
Output: Path to generated video (e.g., /data/ai-stack/output/video_001.mp4)
Generate Talking Head
./scripts/talking-head.sh "Hello, I'm Agent" gentle input.jpg
./scripts/talking-head.sh "Welcome to the future" neutral photo.png
Arguments:
$1: Speech text$2: Voice style (gentle|neutral|energetic) — optional, default: gentle$3: Avatar image path — optional, generates default if not provided
Output: Path to talking head video (e.g., /data/ai-stack/output/talking_001.mp4)
Generate Audio
./scripts/audio.sh "This is a test message" en male
./scripts/audio.sh "Bonjour le monde" fr female
Arguments:
$1: Text to speak$2: Language code (en|fr|es|etc) — optional, default: en$3: Voice gender (male|female) — optional, default: male
Output: Path to audio file (e.g., /data/ai-stack/output/audio_001.wav)
Models Available
Image Models
- z-image — 6B params, S3-DiT, photorealistic (downloading, 43% complete)
- Juggernaut XL v9 — SDXL-based, versatile (7.1GB, ready)
Video Models
- AnimateDiff — SD 1.5 motion module (512x512, working ✅)
- LTX-2 — 19B params, high quality (14GB checkpoint ready, Gemma encoder ready)
Talking Head Models
- SadTalker — Audio-driven head animation (working ✅)
Voice Models
- Voxtral — whisper.cpp-based TTS (installed)
Dependencies
All dependencies are pre-installed on GPU server:
- ComfyUI with custom nodes (AnimateDiff-Evolved, VideoHelperSuite)
- SadTalker with face enhancer
- Voxtral with whisper.cpp
- FFmpeg for video encoding
Error Handling
Scripts will:
- Check SSH connectivity before execution
- Validate GPU server is running
- Return meaningful error messages
- Clean up failed generations automatically
Performance
- Image: ~10-20s for 1024x1024
- Video (AnimateDiff): ~20-30s for 512x512, 16 frames
- Video (LTX-2): ~60-90s for 768x512, 4s @ 24fps
- Talking Head: ~30-40s for 10s video
- Audio: ~2-5s for 30s speech
Future Enhancements
- Batch generation support
- Style transfer capabilities
- Video upscaling (spatial + temporal)
- Multi-language voice cloning
- Real-time preview streaming
Status: Active development Maintainer: Agent GPU Server: ${GPU_USER}@${GPU_HOST}
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install ai-media - After installation, invoke the skill by name or use
/ai-media - Provide required inputs per the skill's parameter spec and get structured output
What is Ai Media?
Generate photorealistic images, videos, talking heads, and natural TTS audio using GPU-accelerated AI models and scripts on a remote server. It is an AI Agent Skill for Claude Code / OpenClaw, with 789 downloads so far.
How do I install Ai Media?
Run "/install ai-media" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Ai Media free?
Yes, Ai Media is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Ai Media support?
Ai Media is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Ai Media?
It is built and maintained by bowen31337 (@bowen31337); the current version is v1.0.1.