功能描述

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use wh...

使用说明 (SKILL.md)

LTX-2.3 Video API

Name: LTX-2.3 Video API
Author: pauldelavallaz

API Reference

Base URL: https://api.ltx.video/v1
Auth: Authorization: Bearer \x3CAPI_KEY>
Response: MP4 binary (direct download, no polling)

Endpoints

Endpoint	Input	Use
`/v1/text-to-video`	prompt	Generate video from text
`/v1/image-to-video`	image_uri + prompt	Animate a still image
`/v1/audio-to-video`	audio_uri + image_uri + prompt	Lip-sync video from audio + image
`/v1/extend`	video_uri + prompt	Extend a video at start or end
`/v1/retake`	video_uri + time range	Regenerate a section of a video

Models

Model	Speed	Quality
`ltx-2-3-fast`	~17s	Good (use for tests)
`ltx-2-3-pro`	~30-60s	Best (use for final)

Supported Resolutions

1920x1080 (landscape 16:9)
1080x1920 (portrait 9:16 — native vertical, trained on vertical data)
1440x1080, 4096x2160 (text-to-video only)

audio-to-video only supports: 1920x1080 or 1080x1920

Quick Examples

Text to Video

curl -X POST "https://api.ltx.video/v1/text-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A man in a navy blue suit sits at a luxury restaurant table...",
    "model": "ltx-2-3-pro",
    "duration": 8,
    "resolution": "1920x1080"
  }' -o output.mp4

Audio to Video (Lip-sync)

curl -X POST "https://api.ltx.video/v1/audio-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_uri": "https://example.com/voice.mp3",
    "image_uri": "https://example.com/portrait.jpg",
    "prompt": "A man speaks directly to camera...",
    "model": "ltx-2-3-pro",
    "resolution": "1920x1080"
  }' -o output.mp4

Python Wrapper

import requests

def ltx_audio_to_video(audio_url, image_url, prompt, api_key,
                        model="ltx-2-3-pro", resolution="1920x1080",
                        output_path="output.mp4"):
    r = requests.post(
        "https://api.ltx.video/v1/audio-to-video",
        headers={"Authorization": f"Bearer {api_key}",
                 "Content-Type": "application/json"},
        json={"audio_uri": audio_url, "image_uri": image_url,
              "prompt": prompt, "model": model, "resolution": resolution},
        timeout=300, stream=True
    )
    if r.status_code != 200:
        raise RuntimeError(f"LTX error {r.status_code}: {r.text}")
    with open(output_path, "wb") as f:
        for chunk in r.iter_content(8192): f.write(chunk)
    return output_path

⚠️ Critical Rules (learned from experience)

File Hosting

URLs must be HTTPS — HTTP is rejected
Files must return correct MIME type (not application/octet-stream)
uguu.se works: upload with curl -F "files[][email protected]" https://uguu.se/upload
Audio: upload as MP3 (not WAV) → uguu returns audio/mpeg ✅
4K images fail → resize to 1920x1080 before uploading

# Upload MP3 to uguu.se
AUDIO_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

# Upload image
IMAGE_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

Image Size Limit

# Resize large images before upload
ffmpeg -y -i input_4k.png -vf "scale=1920:1080" output_1080.jpg

Face Consistency

Avoid prompts where the character looks down — breaks face consistency
Keep head level and gaze forward throughout
Place objects already in frame instead of having character reach below frame

Last Frame

LTX does not support first+last frame natively
Workaround: generate clip A, generate clip B, then use /v1/extend to chain them

Prompting Guide (LTX-2.3)

LTX-2.3 has a much stronger text connector. Specificity wins.

1. Use Verbs, Not Nouns

❌ "A dramatic portrait of a man standing"
✅ "A man stands on a rooftop. His coat flaps in the wind. He adjusts his collar and steps forward as the camera tracks right."

2. Block the Scene Like a Director

Specify left vs right, foreground vs background
Describe who moves, what moves, how they move, what the camera does
Spatial relationships are now respected

3. Describe Audio Explicitly (for text-to-video)

Name the type of sound: dialogue, ambient, music
Specify tone and intensity
Example: "His voice is clear and warm. Restaurant ambient sound softly in the background."

4. Avoid Static Photo-Like Prompts

If the prompt reads like a still image → the output behaves like one
Add wind, motion, breathing, gestures, camera movement

5. Describe Texture and Material

Hair, fabric, surface finish, lighting fall-off
"Individual hair strands visible in the backlight" → now renders correctly

6. Portrait (9:16) Native

resolution: "1080x1920" → trained on vertical data
Frame for vertical intentionally, don't treat as cropped landscape

7. Complex Shots Work Now

Layer multiple actions: "He picks up the banana, raises it to his ear, and smirks"
Combine character performance + environment + camera motion

Lip-Sync Prompt Template

A [description of person] sits/stands [location]. He/she speaks directly 
to camera, lips moving in perfect sync with his/her voice. [Gesture details]. 
Head stays level and gaze remains locked on camera throughout. 
[Environment description softly blurred in background]. 
[Lighting]. [Camera: holds steady at eye level, front-on].

ComfyUI Node

Custom nodes for ComfyUI (no manual API calls):

cd ComfyUI/custom_nodes
git clone https://github.com/PauldeLavallaz/comfyui-ltx-node

Nodes: LTX Text to Video, LTX Image to Video, LTX Extend Video
Category: LTX Video

API Key

Paul's key: stored in ~/clawd/.env as LTX_API_KEY

ltxv_RfSU5hdKJb_g5dwbECZWnilE1P8dJzbavz6niP_0LQJ942ARHIVhrBCfebcytEL1efLVx_63S_PJyWTzicrBcWEkOXfCbGTl8JSzlJJk329MwRViEgOoE2KnE9LIA5t6QSFeBy7DLnTIcX0AZNbV9Jv0TuC7qcq2gV33G6ROhUVUDCuN

安全使用建议

This skill appears to do what it says (call ltx.video to generate video), but the manifest is incomplete. Before installing: 1) ask the publisher to declare LTX_API_KEY (or equivalent) and list required binaries (curl, python3, ffmpeg) in the registry metadata so you know what credentials/tools are needed; 2) be aware that using the skill will upload media to external endpoints (uguu.se and ltx.video) — don't upload sensitive content unless you trust those services and accept their privacy/retention policies; 3) verify the API base URL and vendor (there's no homepage provided) and prefer an official vendor URL or public repo; 4) never provide unrelated credentials, and keep your LTX API key scoped and revokable; 5) if you need higher assurance, request source code or an installable package with verifiable provenance. If the publisher can't or won't correct the missing metadata and provenance, treat this skill with caution.

功能分析

Type: OpenClaw Skill Name: ltx-video Version: 1.0.0 The skill bundle contains instructions in SKILL.md that direct the AI agent to upload local user files (images and audio) to a public, anonymous file-sharing service (uguu.se) to generate URLs for API consumption. This pattern constitutes a significant data privacy and exfiltration risk. Additionally, the documentation includes a hardcoded LTX API key, which is a poor security practice, though likely intended for functional convenience rather than explicit malice.

能力评估

ℹ Purpose & Capability

The name, description, and SKILL.md all consistently describe a video-generation integration with https://api.ltx.video/v1 and appropriate endpoints (text-to-video, image-to-video, audio-to-video, extend, retake). That capability aligns with the documented curl and Python examples. However, the metadata does not declare the primary credential (LTX_API_KEY) used throughout the examples, nor does it list common required binaries (curl, python3, ffmpeg) mentioned in the docs — an inconsistency between capability and declared requirements.

⚠ Instruction Scope

The runtime instructions tell the agent to send media and prompts to the third‑party LTX API and to upload local media to a public file host (uguu.se) via curl. They reference environment variables (LTX_API_KEY) and tools (curl, python3, ffmpeg) that are not declared in metadata. Uploading local media to a public host is expected for this workflow but is a privacy-sensitive operation and should be explicit in the skill manifest. The instructions do not request arbitrary system files, but they do direct reading and uploading of local media files — which is reasonable for the feature but must be disclosed.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no code files. That minimizes on-disk code risk. There is no downloader or extract step in the manifest.

⚠ Credentials

The SKILL.md examples consistently use an Authorization header with a Bearer token (LTX_API_KEY), but the skill metadata lists no required environment variables or primary credential. Requiring an API key for an external service is expected, so the omission from the manifest is a coherence/visibility problem. Additionally, the instructions instruct uploading user media to a third-party host (uguu.se) — this is functionally necessary but privacy‑sensitive and should be called out in the metadata/consent flow.

✓ Persistence & Privilege

always is false and the skill is user-invocable and can be called autonomously (normal). The skill does not request persistent or system-wide privileges in the manifest and does not modify other skills' configs.

版本历史

v1.0.0

Skill inicial: text-to-video, image-to-video, audio-to-video lip-sync, extend. Prompting guide LTX-2.3, reglas criticas de HTTPS/MIME, limites de resolucion y consistencia facial.

元数据

Slug ltx-video

版本 1.0.0

许可证 MIT-0

累计安装 4

当前安装数 3

历史版本数 1

常见问题

LTX-2.3 Video API 是什么？

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use wh... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 389 次。

如何安装 LTX-2.3 Video API？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ltx-video」即可一键安装，无需额外配置。

LTX-2.3 Video API 是免费的吗？

是的，LTX-2.3 Video API 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

LTX-2.3 Video API 支持哪些平台？

LTX-2.3 Video API 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 LTX-2.3 Video API？

由 Paul de Lavallaz（@pauldelavallaz）开发并维护，当前版本 v1.0.0。

LTX-2.3 Video API

LTX-2.3 Video API

API Reference

Endpoints

Models

Supported Resolutions

Quick Examples

Text to Video

Audio to Video (Lip-sync)

Python Wrapper

⚠️ Critical Rules (learned from experience)

File Hosting

Image Size Limit

Face Consistency

Last Frame

Prompting Guide (LTX-2.3)

1. Use Verbs, Not Nouns

2. Block the Scene Like a Director

3. Describe Audio Explicitly (for text-to-video)

4. Avoid Static Photo-Like Prompts

5. Describe Texture and Material

6. Portrait (9:16) Native

7. Complex Shots Work Now

Lip-Sync Prompt Template

ComfyUI Node

API Key

LTX-2.3 Video API 是什么？

如何安装 LTX-2.3 Video API？

LTX-2.3 Video API 是免费的吗？

LTX-2.3 Video API 支持哪些平台？

谁开发了 LTX-2.3 Video API？

💬 留言讨论