Description

Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, OmniHuman 1.0, Fabric 1.0, PixVerse Li...

README (SKILL.md)

AI Avatar & Talking Head Videos

Name: Ai Avatar Video
Author: okaris

Create AI avatars and talking head videos via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

# Create avatar video from image + audio
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Available Models

Model	App ID	Best For
OmniHuman 1.5	`bytedance/omnihuman-1-5`	Multi-character, best quality
OmniHuman 1.0	`bytedance/omnihuman-1-0`	Single character
Fabric 1.0	`falai/fabric-1-0`	Image talks with lipsync
PixVerse Lipsync	`falai/pixverse-lipsync`	Highly realistic

Search Avatar Apps

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

Examples

OmniHuman 1.5 (Multi-Character)

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Supports specifying which character to drive in multi-person images.

Fabric 1.0 (Image Talks)

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Generates highly realistic lipsync from any audio.

Full Workflow: TTS + Avatar

# 1. Generate speech from text
infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to our product demo. Today I will show you..."
}' > speech.json

# 2. Create avatar video with the speech
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://presenter-photo.jpg",
  "audio_url": "\x3Caudio-url-from-step-1>"
}'

Full Workflow: Dub Video in Another Language

# 1. Transcribe original video
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

# 2. Translate text (manually or with an LLM)

# 3. Generate speech in new language
infsh app run infsh/kokoro-tts --input '{"text": "\x3Ctranslated-text>"}' > new_speech.json

# 4. Lipsync the original video with new audio
infsh app run infsh/latentsync-1-6 --input '{
  "video_url": "https://original-video.mp4",
  "audio_url": "\x3Cnew-audio-url>"
}'

Use Cases

Marketing: Product demos with AI presenter
Education: Course videos, explainers
Localization: Dub content in multiple languages
Social Media: Consistent virtual influencer
Corporate: Training videos, announcements

Tips

Use high-quality portrait photos (front-facing, good lighting)
Audio should be clear with minimal background noise
OmniHuman 1.5 supports multiple people in one image
LatentSync is best for syncing existing videos to new audio

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@inference-sh

# Text-to-speech (generate audio for avatars)
npx skills add inference-sh/skills@text-to-speech

# Speech-to-text (transcribe for dubbing)
npx skills add inference-sh/skills@speech-to-text

# Video generation
npx skills add inference-sh/skills@ai-video-generation

# Image generation (create avatar images)
npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: infsh app list --category video

Documentation

Running Apps - How to run apps via CLI
Content Pipeline Example - Building media workflows
Streaming Results - Real-time progress updates

Usage Guidance

This skill appears to be what it says (it uses the inference.sh CLI to run avatar models), but exercise caution before installing: (1) avoid blindly running `curl | sh` — instead download the installer, verify the SHA-256 checksums from the linked checksums.txt, and inspect the installer script if possible; (2) understand that `infsh login` will create/use credentials or tokens (not listed in the skill) — review the CLI's auth/storage behavior and revoke tokens you don't trust; (3) media you provide (images/audio/video URLs) will be sent to the provider's servers — check their privacy/terms if data sensitivity matters; (4) if you want lower risk, prefer running these tools in an isolated sandbox or VM and review the CLI source/release artifacts on the provider's site before executing.

Capability Analysis

Type: OpenClaw Skill Name: ai-avatar-video Version: 0.1.5 The skill bundle is classified as suspicious due to the high-risk installation method specified in `SKILL.md`. The `curl -fsSL https://cli.inference.sh | sh` command directly downloads and executes a shell script from a remote server. While the `allowed-tools` specifies `Bash(infsh *)`, an AI agent could be prompted to execute this initial setup command, leading to a critical Remote Code Execution (RCE) vulnerability and supply chain risk if `cli.inference.sh` were compromised. There is no clear evidence of intentional malicious behavior by the skill author, but the method itself introduces a significant security flaw.

Capability Assessment

✓ Purpose & Capability

The name/description (AI avatar & talking head videos) align with the runtime instructions: all examples invoke the inference.sh CLI (infsh) to run named avatar apps. Listed models and workflows are consistent with the stated capability.

⚠ Instruction Scope

The SKILL.md tells users to run `curl -fsSL https://cli.inference.sh | sh && infsh login` and then `infsh app run ...` with image/audio URLs. This stays within the avatar/video generation scope, but the instructions implicitly cause: (1) execution of a remote install script, (2) an interactive login flow that will obtain credentials/tokens (not declared), and (3) uploading or referencing user media (images/audio) to a third-party service. The file/network actions and credential acquisition are not surfaced in requires.env and should be made explicit.

⚠ Install Mechanism

There is no packaged install spec in the registry entry; the README advocates piping a remote shell script from https://cli.inference.sh (download-and-exec). While the doc claims the installer verifies SHA-256 checksums hosted at dist.inference.sh, the provided one-liner does not show any local checksum verification prior to execution. Download-and-exec from an external URL raises a higher risk profile unless the user manually verifies the binary and checksums beforehand.

ℹ Credentials

The skill declares no required environment variables or primary credential, which is plausible for an instruction-only wrapper. However, it instructs `infsh login` (implying account credentials or API tokens) and uses remote services to process user media; those credential/access implications are not declared. Users should assume credentials/tokens will be created/used and that media will be transmitted to inference.sh backend services.

✓ Persistence & Privilege

The skill does not request always:true, does not include install scripts in the package, and does not claim to modify other skills or system-wide settings. Its persistence footprint depends on whether the user runs the installer; that action is initiated by the user, not forced by the skill.

Version History

v0.1.5

- Added detailed documentation for creating AI avatar and talking head videos using inference.sh CLI. - Listed supported models (OmniHuman 1.5/1.0, Fabric 1.0, PixVerse Lipsync) and best use cases for each. - Provided step-by-step workflow examples for avatar generation and video dubbing. - Included installation notes, usage tips, and related skill recommendations. - Expanded trigger keywords and clarified typical use cases (marketing, education, localization, social media, corporate).

v0.1.0

ai-avatar-video 0.1.0 - Initial release of the skill. - Create AI avatar and talking head videos using OmniHuman, Fabric, and PixVerse models via the inference.sh CLI. - Supports audio-driven avatars, lipsync videos, talking head generation, and virtual presenters. - Detailed usage instructions, model options, and example workflows included in documentation. - Lists related skills for TTS, transcription, and more content generation needs.

Metadata

Slug ai-avatar-video

Version 0.1.5

License —

All-time Installs 5

Active Installs 5

Total Versions 2

Frequently Asked Questions

What is Ai Avatar Video?

Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, OmniHuman 1.0, Fabric 1.0, PixVerse Li... It is an AI Agent Skill for Claude Code / OpenClaw, with 1254 downloads so far.

How do I install Ai Avatar Video?

Run "/install ai-avatar-video" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ai Avatar Video free?

Yes, Ai Avatar Video is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Ai Avatar Video support?

Ai Avatar Video is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ai Avatar Video?

It is built and maintained by Ömer Karışman (@okaris); the current version is v0.1.5.

More Skills

Ai Avatar Video

AI Avatar & Talking Head Videos

Quick Start

Available Models

Search Avatar Apps

Examples

OmniHuman 1.5 (Multi-Character)

Fabric 1.0 (Image Talks)

PixVerse Lipsync

Full Workflow: TTS + Avatar

Full Workflow: Dub Video in Another Language

Use Cases

Tips

Related Skills

Documentation

What is Ai Avatar Video?

How do I install Ai Avatar Video?

Is Ai Avatar Video free?

Which platforms does Ai Avatar Video support?

Who created Ai Avatar Video?

💬 Comments