Description

Match audio tracks to lip movements in your videos. lipsyncvideo-ai uploads your clip to a cloud GPU, syncs the audio you provide to the speaker's mouth, and...

README (SKILL.md)

Getting Started

LipSync Video AI is ready. Upload your video and audio, or describe what you need synced.

Try saying:

"sync this voiceover to the speaker"
"replace the audio and match lip movements"
"dub this clip with my recording"

Initial Setup

First time running this, it connects to the processing backend. Shows a quick "Getting ready..." message.

Token: Check for NEMO_TOKEN in environment. If present, go straight to session setup.

Grab a free token: Generate a UUID client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token using X-Client-Id header with your UUID. Response data.token is your auth token (100 credits, good for 7 days).
Start session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent, Bearer auth, body: {"task_name":"project","language":"\x3Clang>"}. Save the session_id for later calls.

Raw JSON and tokens stay hidden from the user.

Sync Audio to Lip Movements in Your Clips

Name: Lipsyncvideo Ai
Author: mory128

Upload your video with the audio you want synced. Cloud GPUs do the heavy lifting — no local processing.

Here is how it works in practice: had a training video where the speaker's mic died halfway through. Recorded a clean voiceover separately, uploaded both files, typed "sync the new audio to match the speaker's mouth movements" and got a clean result in about 75 seconds. Output is 1080p MP4.

Pro tip: shorter clips give tighter sync. If you have a long video, consider breaking it into segments first.

Request Categories

Your input gets matched to the right processing path automatically.

You type...	Goes to...	Uses SSE?
"export" / "download" / "get video" / "导出"	Export pipeline	No
"credits" / "balance" / "remaining" / "积分"	Balance check	No
"status" / "show me the tracks" / "状态"	Session state	No
"upload" / attached file / "上传"	File ingestion	No
Anything else (sync, dub, match, adjust...)	SSE processing	Yes

Backend Architecture

Files go to a GPU farm for processing. Output is encoded at 8Mbps for 1080p. Lip sync boundaries are frame-level accurate.

Required on every request: Authorization: Bearer \x3CNEMO_TOKEN> and attribution headers X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution means export fails with 402.

Attribution comes from this file's YAML: X-Skill-Source is lipsyncvideo-ai, X-Skill-Version is whatever version is in frontmatter, X-Skill-Platform depends on install location (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, otherwise unknown).

Root URL: https://mega-api-prod.nemovideo.ai

New session: POST /api/tasks/me/with-session/nemo_agent with {"task_name":"project","language":"\x3Clang>"}. Returns task_id, session_id.

SSE message: POST /run_sse with {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} and Accept: text/event-stream. Cap: 15 min.

File upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — multipart (-F "files=@/path") or URL mode ({"urls":["\x3Curl>"],"source_type":"url"}).

Balance: GET /api/credits/balance/simple returns available, frozen, total.

State: GET /api/state/nemo_agent/me/\x3Csid>/latest — check data.state.draft, data.state.video_infos, data.state.generated_media.

Export (free): POST /api/render/proxy/lambda with {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s. Done when status = completed. File at output.url.

Handles: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Errors

Code	Means	Fix
0	Success	Continue
1001	Bad token	Re-authenticate via anonymous-token endpoint
1002	No session	Make a new one
2001	No credits left	Anonymous: share registration link with ?bind=\x3Cid>. Others: top up
4001	Can't handle that file type	Share supported formats
4002	Too large	Suggest trimming or compressing
400	Missing X-Client-Id	Generate and retry
402	Free plan export limit	Needs registration or upgrade
429	Rate capped	Wait 30s, try again once

Converting GUI Instructions

Backend outputs reference a visual interface. Convert them:

Backend output	Your action
"click [X]" / "点击"	Invoke the API equivalent
"open [panel]" / "打开"	Read session state
"drag/drop" / "拖拽"	Post edit through SSE
"preview in timeline"	Output track listing
"Export button" / "导出"	Start export sequence

How SSE Works

Forward text events to user (after GUI translation). Absorb tool calls. Heartbeat and empty data lines = still processing. Every 2 minutes of quiet, say "Hang on, still processing..."

About 30% of edit ops return no text. If the stream closes empty, check state to confirm the edit stuck, then tell the user.

Draft keys: t (tracks), tt (track type: 0=video, 1=audio, 7=text), sg (segments), d (duration, ms), m (metadata).

Timeline (2 tracks): 1. Video: interview clip (0-45s) 2. Audio: dubbed voiceover (0-45s)

Common Workflows

Basic lip sync: Upload video + audio, ask for sync. Done.

Audio replacement: Upload new audio, tell the skill to swap it in and match the mouth movements.

Multi-speaker: Works best when speakers take turns. For overlapping speech, split into separate segments first.

FAQ

How accurate is the sync? Frame-level for clear speech. Mumbling or fast-talking may be slightly off.

What audio formats? MP3, WAV, M4A, AAC all work.

File size limit? 500MB. Compress if you're over.

Cost? First 100 operations free. No signup required.

Usage Guidance

This skill will upload your videos and audio to an external service (mega-api-prod.nemovideo.ai) for processing — that's expected for a cloud lip-sync tool, but you should only proceed if you're comfortable with that data leaving your device. Before installing or using: 1) Verify the service/provider (there's no homepage or source listed here). 2) Prefer using the anonymous-token flow described in SKILL.md (short-lived token) rather than handing over any unrelated or high-privilege tokens. 3) Ask the publisher about data retention, sharing, and delete policies for uploaded media. 4) Note metadata inconsistencies (registry says NEMO_TOKEN required; SKILL.md documents anonymous tokens and a config path) — confirm whether an env token is actually required. 5) If you handle sensitive footage, test with non-sensitive clips first or run processing through an isolated account. If you want, I can help draft questions to ask the publisher or check network endpoints for further provenance.

Capability Analysis

Type: OpenClaw Skill Name: lipsyncvideo-ai Version: 1.0.1 The lipsyncvideo-ai skill is a legitimate integration for a cloud-based video processing service. It provides detailed instructions for an AI agent to manage authentication, file uploads, and media rendering via the 'nemovideo.ai' API. All network activities and data handling (such as using the NEMO_TOKEN) are directly aligned with the stated purpose of syncing audio to video lip movements, with no evidence of malicious intent, data exfiltration, or unauthorized system access.

Capability Tags

crypto

Capability Assessment

ℹ Purpose & Capability

The skill's name/description (lip-sync by uploading video/audio to cloud GPUs) aligns with the runtime instructions that POST uploads and start render jobs on a remote API. However the registry metadata lists NEMO_TOKEN as a required env var while the SKILL.md also documents an anonymous-token endpoint (client-generated UUID) — either approach could be legitimate, but the registry claiming the token is required while the doc provides an anonymous path is an inconsistency worth noting.

ℹ Instruction Scope

SKILL.md explicitly instructs the agent to upload user video/audio to https://mega-api-prod.nemovideo.ai, manage sessions, stream SSE messages, and poll rendering status — all expected for this service. It does not ask the agent to read unrelated system files or other credentials. Minor scope creep: it requires adding attribution headers derived from install path, which instructs the agent to inspect its environment/paths to form X-Skill-Platform; that is plausible but gives the skill some discretion about local context.

✓ Install Mechanism

No install script or third-party download is present (instruction-only). That reduces disk-write/exec risk. There is no brew/npm/URL install to review.

⚠ Credentials

Only one credential (NEMO_TOKEN) is declared as primary, which is proportionate for an external API. But SKILL.md also references a config path (~/.config/nemovideo/) in its frontmatter while registry metadata earlier listed no required config paths — another metadata mismatch. Also the skill documents an anonymous-token endpoint that issues short-lived tokens; the registry nonetheless marks NEMO_TOKEN as required, which may mislead users into supplying a long-lived token unnecessarily. Because the backend host/source is unverified and no privacy/retention policy is provided, granting a token that allows arbitrary uploads raises privacy concerns.

✓ Persistence & Privilege

The skill is not always-enabled and does not request elevated persistent platform privileges. It will create and use session IDs and short-lived tokens (normal for this type of service). It does not declare modifications to other skills or system-wide configs.

Version History

v1.0.1

**v1.0.1 Changelog** - Rewrote all instructions and prompts for brevity and clarity. - Simplified setup and usage steps; now highlights short, concrete examples. - Updated supported formats and file size limits in the documentation. - Made error codes, request routing, and backend API steps easier to follow. - Improved descriptions of workflows and key use cases. - Refreshed metadata (emoji, variant) to match new documentation focus.

v1.0.0

LipSync Video AI v1.0.0 — Initial Release - Automatically syncs lip movements in any video to a new audio track with high accuracy. - Supports video dubbing, voiceover replacement, and talking avatar animation. - Handles session setup, credit management, uploads, and exports via a cloud backend. - Provides clear error messages and detailed routing for common user requests. - Accepts popular video/audio/image formats (e.g., mp4, mov, mp3, jpg). - Easy onboarding: offers anonymous free credits for new users.

Metadata

Slug lipsyncvideo-ai

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Lipsyncvideo Ai?

Match audio tracks to lip movements in your videos. lipsyncvideo-ai uploads your clip to a cloud GPU, syncs the audio you provide to the speaker's mouth, and... It is an AI Agent Skill for Claude Code / OpenClaw, with 101 downloads so far.

How do I install Lipsyncvideo Ai?

Run "/install lipsyncvideo-ai" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Lipsyncvideo Ai free?

Yes, Lipsyncvideo Ai is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Lipsyncvideo Ai support?

Lipsyncvideo Ai is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Lipsyncvideo Ai?

It is built and maintained by mory128 (@mory128); the current version is v1.0.1.

More Skills

Lipsyncvideo Ai