Description

Cloud-based best-avatar-video tool that handles creating realistic talking avatar videos from photos and scripts. Upload JPG, PNG, MP4, MOV files (up to 200M...

README (SKILL.md)

Getting Started

Got text or images to work with? Send it over and tell me what you need — I'll take care of the AI avatar video creation.

Try saying:

"generate a headshot photo and a 200-word script into a 1080p MP4"
"create a talking avatar video from my photo and script in English"
"creating realistic talking avatar videos from photos and scripts for marketers, educators, content creators"

Automatic Setup

On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: \x3Cuuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).

Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

Best Avatar Video — Generate Talking Avatar Videos

Name: Best Avatar Video
Author: tk8544-b

Send me your text or images and describe the result you want. The AI avatar video creation runs on remote GPU nodes — nothing to install on your machine.

A quick example: upload a headshot photo and a 200-word script, type "create a talking avatar video from my photo and script in English", and you'll get a 1080p MP4 back in roughly 1-2 minutes. All rendering happens server-side.

Worth noting: shorter scripts under 60 seconds render noticeably faster.

Matching Input to Actions

User prompts referencing best avatar video, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Base URL: https://mega-api-prod.nemovideo.ai

Endpoint	Method	Purpose
`/api/tasks/me/with-session/nemo_agent`	POST	Start a new editing session. Body: `{"task_name":"project","language":"\x3Clang>"}`. Returns `session_id`.
`/run_sse`	POST	Send a user message. Body includes `app_name`, `session_id`, `new_message`. Stream response with `Accept: text/event-stream`. Timeout: 15 min.
`/api/upload-video/nemo_agent/me/\x3Csid>`	POST	Upload a file (multipart) or URL.
`/api/credits/balance/simple`	GET	Check remaining credits (`available`, `frozen`, `total`).
`/api/state/nemo_agent/me/\x3Csid>/latest`	GET	Fetch current timeline state (`draft`, `video_infos`, `generated_media`).
`/api/render/proxy/lambda`	POST	Start export. Body: `{"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}`. Poll status every 30s.

Accepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Skill attribution — read from this file's YAML frontmatter at runtime:

X-Skill-Source: best-avatar-video
X-Skill-Version: from frontmatter version
X-Skill-Platform: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

Every API call needs Authorization: Bearer \x3CNEMO_TOKEN> plus the three attribution headers above. If any header is missing, exports return 402.

Error Handling

Code	Meaning	Action
0	Success	Continue
1001	Bad/expired token	Re-auth via anonymous-token (tokens expire after 7 days)
1002	Session not found	New session §3.0
2001	No credits	Anonymous: show registration URL with `?bind=\x3Cid>` (get `\x3Cid>` from create-session or state response when needed). Registered: "Top up credits in your account"
4001	Unsupported file	Show supported formats
4002	File too large	Suggest compress/trim
400	Missing X-Client-Id	Generate Client-Id and retry (see §1)
402	Free plan export blocked	Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429	Rate limit (1 token/client/7 days)	Retry in 30s once

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Query session state
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute export workflow

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Common Workflows

Quick edit: Upload → "create a talking avatar video from my photo and script in English" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "create a talking avatar video from my photo and script in English" — concrete instructions get better results.

Max file size is 200MB. Stick to JPG, PNG, MP4, MOV for the smoothest experience.

Export as MP4 for widest compatibility across social and presentation platforms.

Usage Guidance

This skill looks like a legitimate client for an external avatar-rendering API and asks only for a NEMO_TOKEN — which is appropriate. However: 1) inspect the SKILL.md source for invisible/control characters (the scanner flagged them); remove or request a clean copy if you see any hidden content. 2) Confirm the service domain (mega-api-prod.nemovideo.ai) and the skill's author before supplying real credentials. Prefer using an ephemeral or restricted NEMO_TOKEN (or anonymous token) rather than a long-lived account credential. 3) Remember any images, audio, or scripts you upload will be sent to the external service — do not upload sensitive or private material unless you accept that risk. 4) If you need stronger assurance, ask the publisher for provenance (homepage, owner identity) or request an audited/verifiable implementation (code or official client). If you proceed, test first with non-sensitive sample media and limited-scoped credentials.

Capability Analysis

Type: OpenClaw Skill Name: best-avatar-video Version: 1.0.0 The skill provides a functional interface for a cloud-based talking avatar video generation service (nemovideo.ai). It includes detailed instructions for the AI agent to manage authentication via anonymous tokens, handle file uploads, and monitor rendering tasks through a structured API. The behavior is consistent with its stated purpose, and there is no evidence of data exfiltration, malicious execution, or unauthorized access to sensitive system files.

Capability Assessment

✓ Purpose & Capability

Name/description align with the runtime instructions: the skill routes uploads and text to the nemo video API and requires a NEMO_TOKEN. Declared config path (~/.config/nemovideo/) and primaryEnv NEMO_TOKEN are coherent with a cloud rendering provider.

ℹ Instruction Scope

Instructions are explicit about API endpoints, session/token lifecycle, SSE streaming, uploads, and error codes — these are within the expected scope. Two items to note: (1) the skill asks the agent to read the SKILL.md frontmatter and to detect install path (to populate X-Skill-Platform), which requires access to local path context; (2) the SKILL.md contained unicode-control-chars flagged by the scanner, which can be used for prompt-injection or to hide directives. Neither alone proves malicious, but both merit manual inspection.

✓ Install Mechanism

Instruction-only skill with no install spec and no code files — lowest-risk installation surface (nothing is written to disk by an installer).

✓ Credentials

Only NEMO_TOKEN is required (primary credential). That matches the described cloud API usage. The metadata also references a config path (~/.config/nemovideo/) which is plausible for local token caching; however, any token storage/access should be reviewed because it grants the external service the ability to act on behalf of the user.

✓ Persistence & Privilege

The skill is not force-included (always: false) and requests only to save session_id for jobs — expected behavior for a remote rendering workflow. It does not request elevated platform-wide privileges.

Version History

v1.0.0

Best Avatar Video version 1.0.0 — initial release. - Generate realistic talking avatar videos from user-uploaded photos and scripts via a cloud-based platform. - Supports multiple input file types (JPG, PNG, MP4, MOV, and more) — up to 200MB per file. - Guided setup with automatic token and session management; quick onboarding for both free and registered users. - Cloud GPU rendering returns 1080p MP4 videos in about 1–2 minutes. - Includes credit management, file upload, export actions, and clear error handling. - User prompts mapped to relevant workflow actions for streamlined editing and exporting.

Metadata

Slug best-avatar-video

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Best Avatar Video?

Cloud-based best-avatar-video tool that handles creating realistic talking avatar videos from photos and scripts. Upload JPG, PNG, MP4, MOV files (up to 200M... It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install Best Avatar Video?

Run "/install best-avatar-video" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Best Avatar Video free?

Yes, Best Avatar Video is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Best Avatar Video support?

Best Avatar Video is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Best Avatar Video?

It is built and maintained by tk8544-b (@tk8544-b); the current version is v1.0.0.

More Skills

Best Avatar Video