Description

Skip the learning curve of professional editing software. Describe what you want — create a music video with visuals that sync to the beat of my song — and g...

README (SKILL.md)

Getting Started

Share your audio files and I'll get started on AI music video creation. Or just tell me what you're thinking.

Try saying:

"generate my audio files"
"export 1080p MP4"
"create a music video with visuals"

Automatic Setup

On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: \x3Cuuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).

Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

AI Music Video App — Turn Songs Into Music Videos

Name: Ai Music Video App
Author: dsewell-583h0

Drop your audio files in the chat and tell me what you need. I'll handle the AI music video creation on cloud GPUs — you don't need anything installed locally.

Here's a typical use: you send a a 3-minute MP3 song track, ask for create a music video with visuals that sync to the beat of my song, and about 1-2 minutes later you've got a MP4 file ready to download. The whole thing runs at 1080p by default.

One thing worth knowing — shorter tracks under 2 minutes process faster and give tighter beat sync results.

Matching Input to Actions

User prompts referencing ai music video app, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Skill attribution — read from this file's YAML frontmatter at runtime:

X-Skill-Source: ai-music-video-app
X-Skill-Version: from frontmatter version
X-Skill-Platform: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

Include Authorization: Bearer \x3CNEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

"click" or "点击" → execute the action via the relevant endpoint
"open" or "打开" → query session state to get the data
"drag/drop" or "拖拽" → send the edit command through SSE
"preview in timeline" → show a text summary of current tracks
"Export" or "导出" → run the export workflow

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Codes

0 — success, continue normally
1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
1002 — session not found; create a new one
2001 — out of credits; anonymous users get a registration link with ?bind=\x3Cid>, registered users top up
4001 — unsupported file type; show accepted formats
4002 — file too large; suggest compressing or trimming
400 — missing X-Client-Id; generate one and retry
402 — free plan export blocked; not a credit issue, subscription tier
429 — rate limited; wait 30s and retry once

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "create a music video with visuals that sync to the beat of my song" — concrete instructions get better results.

Max file size is 200MB. Stick to MP3, WAV, AAC, FLAC for the smoothest experience.

Export as MP4 for widest compatibility across YouTube, Spotify Canvas, and social platforms.

Common Workflows

Quick edit: Upload → "create a music video with visuals that sync to the beat of my song" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Usage Guidance

Use this skill if you are comfortable sending your audio/media files and prompts to `https://mega-api-prod.nemovideo.ai`. Prefer a dedicated NEMO_TOKEN, avoid sensitive files, and review generated exports before sharing them.

Capability Analysis

Type: OpenClaw Skill Name: ai-music-video-app Version: 1.0.0 The skill is a legitimate integration for an AI music video service hosted at mega-api-prod.nemovideo.ai. It facilitates audio uploads, session management, and video rendering via standard REST and SSE endpoints. While it handles authentication tokens and performs environment detection for attribution (e.g., checking install paths like ~/.cursor/skills/), these actions are transparently documented in SKILL.md and align with the stated purpose of the application without evidence of malicious intent or data exfiltration.

Capability Assessment

ℹ Purpose & Capability

The cloud upload, generation, and export workflow matches the stated purpose of creating music videos from user media, but users should understand their files leave the local environment.

ℹ Instruction Scope

The skill tells the agent to connect automatically and translate backend GUI-style instructions into API calls; this is bounded to the NemoVideo workflow but should be noticed.

✓ Install Mechanism

No install spec, binaries, package dependencies, or code files are present; this is an instruction-only skill.

ℹ Credentials

Use of NEMO_TOKEN or an anonymous provider token is proportionate for the stated API integration and is disclosed in the artifact.

✓ Persistence & Privilege

The artifact mentions a provider session_id and render jobs, but does not show local autostart, background workers, privilege escalation, or persistent local execution.

Version History

v1.0.0

- Initial release of the AI Music Video App skill. - Instantly creates synced music videos from user-uploaded audio files (MP3, WAV, AAC, FLAC up to 200MB). - Handles all backend setup: authenticates, manages session, uploads audio, and returns downloadable MP4/videos. - Supports user prompts for exporting, checking credits/status, uploading, and generating music videos via cloud GPU rendering. - Provides clear summaries of current timelines and workflow status. - Includes error handling for token issues, file size/type limits, credits, and server responses.

Metadata

Slug ai-music-video-app

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Ai Music Video App?

Skip the learning curve of professional editing software. Describe what you want — create a music video with visuals that sync to the beat of my song — and g... It is an AI Agent Skill for Claude Code / OpenClaw, with 61 downloads so far.

How do I install Ai Music Video App?

Run "/install ai-music-video-app" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ai Music Video App free?

Yes, Ai Music Video App is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Ai Music Video App support?

Ai Music Video App is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ai Music Video App?

It is built and maintained by dsewell-583h0 (@dsewell-583h0); the current version is v1.0.0.

More Skills

Ai Music Video App