功能描述

Turn any song into a captivating music video with music-video-maker-ai — the smart creative tool that syncs visuals to your beat, generates scene ideas, and...

使用说明 (SKILL.md)

Getting Started

Drop your song lyrics, genre, or track mood and I'll generate a full music video concept with scenes, shot ideas, and visual direction. No lyrics? Just describe the feeling you want your video to have.

Try saying:

"I have an indie pop song about heartbreak with a slow tempo — can you write a storyboard with 8 scenes that match the emotional arc of the lyrics?"
"Create a music video concept for a high-energy hip-hop track. I want street scenes, quick cuts, and a color grade that feels gritty but cinematic."
"My band is shooting a music video this weekend with minimal budget. Give me a shot list and location ideas for a melancholic acoustic folk song set outdoors."

First-Time Connection

When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.

Obtain a free token: Generate a random UUID as client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer \x3Ctoken>, Content-Type: application/json, and body {"task_name":"project","language":"\x3Cdetected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

From Beat Drop to Final Cut — Visually

Name: Music Video Maker Ai
Author: susan4731-wilfordf

Music videos are no longer just for major label artists with six-figure budgets. Music Video Maker AI gives independent musicians, YouTubers, and creative directors the tools to conceptualize, script, and sequence compelling music videos — all driven by the mood, tempo, and lyrics of their track.

This skill analyzes the emotional arc of your song and translates it into visual language. Describe your track's genre, vibe, or paste your lyrics, and you'll get scene-by-scene storyboard ideas, shot type suggestions, color palette recommendations, and transition cues timed to your music's structure. Think of it as having a creative director on call.

Whether you're planning a DIY shoot with a smartphone or directing a full production, Music Video Maker AI helps you arrive on set with a clear vision. No more blank-page paralysis — just a focused creative brief that makes every second of your video intentional and impactful.

Routing Your Visual Requests

Every prompt you send — whether it's a mood, genre, lyric snippet, or full creative brief — gets parsed and dispatched to the appropriate generation pipeline based on video style, beat-sync requirements, and visual complexity.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Rendering API Reference

Music Video Maker AI runs on a distributed cloud rendering backend that processes audio waveform analysis, scene segmentation, and frame generation concurrently to deliver beat-matched visuals at scale. Render jobs are queued, prioritized by resolution tier, and returned as streamable video assets once the pipeline completes synthesis.

Skill attribution — read from this file's YAML frontmatter at runtime:

X-Skill-Source: music-video-maker-ai
X-Skill-Version: from frontmatter version
X-Skill-Platform: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer \x3CNEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Query session state
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

Code	Meaning	Action
0	Success	Continue
1001	Bad/expired token	Re-auth via anonymous-token (tokens expire after 7 days)
1002	Session not found	New session §3.0
2001	No credits	Anonymous: show registration URL with `?bind=\x3Cid>` (get `\x3Cid>` from create-session or state response when needed). Registered: "Top up credits in your account"
4001	Unsupported file	Show supported formats
4002	File too large	Suggest compress/trim
400	Missing X-Client-Id	Generate Client-Id and retry (see §1)
402	Free plan export blocked	Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429	Rate limit (1 token/client/7 days)	Retry in 30s once

Best Practices

Before diving into full storyboard generation, start with a one-paragraph brief about your artist identity, target audience, and the story you want the video to tell. This context shapes every creative decision the AI makes — from casting suggestions to set design ideas.

Always request scene descriptions that include shot type (wide, close-up, tracking), lighting mood, and action happening on screen. Vague outputs are usually the result of vague inputs. The more cinematic language you use in your prompt, the more cinematic the response.

If you're working with a real production crew, ask Music Video Maker AI to format the output as a proper shot list or call sheet-style document. This makes it immediately usable on set and bridges the gap between AI-generated concepts and real-world execution. Revisit the concept after your first rough cut to get re-edit and pacing suggestions.

Troubleshooting

If your generated concept feels too generic, it usually means the prompt lacked specificity. Avoid broad descriptors like 'cool' or 'modern' — instead use references like 'early 2000s MTV aesthetic' or 'A24 film color grading' to anchor the creative direction.

Getting scenes that don't match your song's energy? Try explicitly stating the BPM or describing the song structure (e.g., 'verse is slow and whispered, chorus explodes with drums'). Music Video Maker AI responds well to structural cues that mirror how a real director would read a track before shooting.

If the storyboard feels too long or too short for your song's runtime, specify the exact duration and number of scenes you need. For a 3-minute song, asking for 12 scenes gives roughly 15 seconds per scene — a natural rhythm for most music videos. You can always ask for scene expansions or cuts to tighten the pacing.

Tips and Tricks

Get the most out of Music Video Maker AI by giving it as much context as possible upfront. The more specific you are about your song's tempo, genre, lyrical themes, and intended audience, the more tailored and production-ready your output will be.

Try pasting your full lyrics directly into the prompt — the AI uses line breaks, emotional peaks, and recurring hooks to time scene transitions and build a narrative structure that mirrors your song's flow. Chorus sections often translate best into high-energy visual moments or repeated motifs.

Don't overlook the power of color and mood prompts. Phrases like 'neon-drenched nighttime' or 'sun-bleached desert road' give the AI a cinematic anchor to build around. You can also ask for multiple concept variations and cherry-pick elements from each to create a hybrid vision that's uniquely yours.

安全使用建议

This skill appears to do what it says: it calls a nemo-video backend to analyze audio/lyrics and produce video assets. Before installing, consider: (1) Privacy — the skill uploads user files (audio/video) and sends them to https://mega-api-prod.nemovideo.ai. Do not upload sensitive content you do not want transmitted. (2) Token handling — it will use NEMO_TOKEN if supplied, or obtain an anonymous token automatically; ask where session IDs/tokens are stored and revoke them if needed. (3) Trust & provenance — the skill's source/homepage are unknown; verify the nemovideo.ai service and its privacy/terms if you rely on it. (4) Prefer supplying your own token if you want control, and avoid keeping long-lived credentials in shared environments. If you need more assurance, request the skill's source code or an official publisher/homepage before installing.

能力评估

✓ Purpose & Capability

The skill is a cloud-backed music-video generation helper. Requesting a single NEMO_TOKEN and a config path under ~/.config/nemovideo/ is consistent with calling a nemo-video API backend. No unrelated credentials or binaries are requested.

ℹ Instruction Scope

SKILL.md instructs the agent to automatically connect to the nemo API on first use (obtain an anonymous token if NEMO_TOKEN is missing), create sessions, send SSE requests, and upload files/URLs to the remote service. These actions are expected for a video-generation service, but they do mean the skill will (1) perform outbound network calls without further prompting, (2) upload user-provided files or local paths to an external host, and (3) read its own frontmatter and detect install paths for attribution headers. The instructions explicitly say not to display raw API responses or tokens to users.

✓ Install Mechanism

There is no install spec and no code files — this is instruction-only, so nothing is written to disk by an installer. Lowest install risk.

ℹ Credentials

Only NEMO_TOKEN is declared as required (primary credential), which aligns with the described API. The skill will generate an anonymous token if none is provided, and it references a config path (~/.config/nemovideo/) where session state may be stored. This is proportionate to the backend usage, but it means credentials/tokens and session IDs may be created and persisted by the skill unless the agent/user manages them explicitly.

ℹ Persistence & Privilege

always:false and default autonomous invocation are set (normal). The skill instructs storing a returned session_id for subsequent requests; it does not explicitly say where or how long, so session tokens may be persisted. The skill does not request system-wide privileges or modify other skills' configurations.

版本历史

v1.0.0

Initial release of Music Video Maker AI skill: - Instantly generates scene-by-scene music video concepts, synced to your song's tempo, lyrics, and mood. - Handles lyric timing, shot sequencing, mood-matched visuals, and storyboard scripting for all skill levels. - Supports prompt-based requests for genres, moods, visual styles, and specific video details. - Built-in authentication and cloud rendering using the Nemo Video API, with free trial credits for new users. - Provides export, upload, credits check, and session management directly via commands. - Includes robust error handling and direct translation of backend actions for seamless user experience.

元数据

Slug music-video-maker-ai

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Music Video Maker Ai 是什么？

Turn any song into a captivating music video with music-video-maker-ai — the smart creative tool that syncs visuals to your beat, generates scene ideas, and... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 77 次。

如何安装 Music Video Maker Ai？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install music-video-maker-ai」即可一键安装，无需额外配置。

Music Video Maker Ai 是免费的吗？

是的，Music Video Maker Ai 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Music Video Maker Ai 支持哪些平台？

Music Video Maker Ai 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Music Video Maker Ai？

由 susan4731-wilfordf（@susan4731-wilfordf）开发并维护，当前版本 v1.0.0。

Music Video Maker Ai