Description

Drop a video and describe what you want — the gemini-ai-video-generator skill uses Google's Gemini AI to analyze your footage and generate new scenes, summar...

README (SKILL.md)

Getting Started

Welcome! Ready to let Gemini AI do the creative work? Upload your video and tell me what you want to generate — a highlight reel, scene description, caption track, or something entirely new. Let's get started.

Try saying:

"Here's a 10-minute product demo video — generate a 60-second highlight reel focusing on the key features shown"
"Analyze this lecture recording and write a structured summary with timestamps for each major topic covered"
"Watch this travel footage and generate a narration script I can record as a voiceover"

First-Time Connection

When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.

Client identifier: Look for ~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking.
Obtain a free token: POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer \x3Ctoken>, Content-Type: application/json, and body {"task_name":"project","language":"\x3Cdetected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

Turn Your Words Into Video With Gemini AI

Name: Gemini Ai Video Generator
Author: peand-rover

The Gemini AI Video Generator skill brings Google's multimodal Gemini model directly into your video workflow. Instead of manually cutting, scripting, or re-shooting, you simply upload your video and tell the skill what you want — in plain English. Gemini reads both the visual content and your instructions together, then generates output that actually understands what's on screen.

This skill is built for people who have video and a vision but don't want to spend hours in editing software. Whether you're a solo creator repurposing long-form content into short clips, a marketing team generating product highlight reels, or an educator turning lecture recordings into structured summaries, this skill handles the heavy lifting.

You're not limited to trimming or adding filters. Gemini AI Video Generator can describe what's happening in a scene, suggest narrative structure, generate spoken-word scripts based on visual cues, or produce entirely new content framed around your uploaded footage. It's a fundamentally different kind of video tool — one that listens before it creates.

Prompt Routing and Generation Flow

When you describe a scene, your natural-language prompt is parsed, enriched with cinematic parameters, and dispatched directly to the Gemini video synthesis pipeline for frame-by-frame generation.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

NemoVideo Backend Reference

ClawHub routes all video generation requests through the NemoVideo API, which handles Gemini model orchestration, render queuing, and secure video delivery. NemoVideo manages diffusion sampling, temporal coherence, and output encoding so your clips stay smooth and consistent across every generation.

Skill attribution — read from this file's YAML frontmatter at runtime:

X-Skill-Source: gemini-ai-video-generator
X-Skill-Version: from frontmatter version
X-Skill-Platform: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer \x3CNEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id. After creating a session, give the user a link: https://nemovideo.com/workspace/claim?token=&task=\x3Ctask_id>&session=\x3Csession_id>&skill_name=gemini-ai-video-generator&skill_version=1.0.0&skill_source=\x3Cplatform>

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Query session state
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

Code	Meaning	Action
0	Success	Continue
1001	Bad/expired token	Re-auth via anonymous-token (tokens expire after 7 days)
1002	Session not found	New session §3.0
2001	No credits	Anonymous: show registration URL with `?bind=\x3Cid>` (get `\x3Cid>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai"
4001	Unsupported file	Show supported formats
4002	File too large	Suggest compress/trim
400	Missing X-Client-Id	Generate Client-Id and retry (see §1)
402	Free plan export blocked	Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export."
429	Rate limit (1 token/client/7 days)	Retry in 30s once

FAQ

What video formats does Gemini AI Video Generator support? You can upload mp4, mov, avi, webm, and mkv files. Mp4 is the most reliably processed format across all generation task types.

Can it generate entirely new video footage? The skill generates text-based outputs — scripts, descriptions, summaries, captions, and structured content — derived from your uploaded video. It does not render new video frames or visual animations.

Does it understand spoken audio in the video? Yes. Gemini processes both the visual content and any audible speech in your video, which means it can cross-reference what's being said with what's on screen for more accurate generation.

What if my video has no dialogue? No problem — Gemini AI Video Generator analyzes visual cues, movement, scene transitions, and on-screen text independently. Silent product demos, tutorials, and b-roll footage all work well with descriptive or script-generation prompts.

Tips and Tricks

Be specific in your prompt — Gemini AI Video Generator responds much better to 'generate a 3-sentence product description focusing on the unboxing moment at the start' than to 'summarize this video.' The more context you give about your intended audience or output format, the sharper the results.

If you're generating captions or subtitles, mention the tone you want (formal, casual, punchy) and any terminology specific to your industry. Gemini will adapt its language accordingly rather than defaulting to generic phrasing.

For content repurposing workflows, try uploading the same video with different prompts — one for a short-form social caption, one for a blog summary, one for an email teaser. You'll get distinct outputs tailored to each format without re-editing the source file.

When generating scripts or voiceovers, ask Gemini to match the pacing of the original footage. This produces scripts that actually fit the visual rhythm rather than running long or cutting short.

Performance Notes

Gemini AI Video Generator performs best on videos under 10 minutes in length, where the model can maintain full visual context throughout the clip. Longer videos may be processed in segments, which can occasionally affect continuity in generated outputs like scripts or summaries.

File format matters less than resolution and clarity — mp4 and webm files with clear audio tracks tend to produce the most accurate scene analysis and generation results. Heavily compressed or low-light footage may result in less precise visual descriptions.

Generation time scales with video length and task complexity. A simple scene description on a 2-minute clip returns quickly, while generating a full narration script for a 15-minute video will take noticeably longer. Plan accordingly if you're working with batch content.

Usage Guidance

Before installing: (1) Understand that using the skill will upload your videos to nemovideo's servers (https://mega-api-prod.nemovideo.ai and https://nemovideo.com). Do not use it for sensitive or private footage unless you trust NemoVideo’s policies. (2) The skill needs a NemoVideo token (NEMO_TOKEN); if not provided it will request an anonymous token and will create a small config file at ~/.config/nemovideo/client_id. Decide if you’re comfortable with that file being created. (3) The skill advertises 'Gemini' but communicates solely with the NemoVideo API — verify you accept that NemoVideo, not Google directly, will process your data. (4) Check NemoVideo’s homepage/repository and privacy/terms if you need legal/ownership guarantees for uploaded media. (5) If you want additional assurance, request or inspect a concrete network/schema spec (exact fields sent with uploads, persistence of returned tokens, retention policy for uploaded videos) before enabling the skill.

Capability Analysis

Type: OpenClaw Skill Name: gemini-ai-video-generator Version: 1.0.0 The gemini-ai-video-generator skill is a standard integration for the NemoVideo SaaS platform, using Google's Gemini AI for video processing. It manages its own configuration and state within `~/.config/nemovideo/` and communicates exclusively with the stated backend at `mega-api-prod.nemovideo.ai`. The instructions in `SKILL.md` are well-defined, focusing on automating the authentication flow and mapping API responses to agent actions without any evidence of data exfiltration, unauthorized access, or malicious intent.

Capability Assessment

✓ Purpose & Capability

The skill name/description claim: take an uploaded video and request generation/summaries/edits via Gemini-powered processing. The runtime instructions send video and commands to a NemoVideo API (mega-api-prod.nemovideo.ai) and require a NemoVideo token — this is coherent with the stated purpose. Note: the skill advertises use of Google's Gemini model but does not contact Google APIs directly; instead it routes through NemoVideo's backend which it claims orchestrates Gemini. That distinction may matter if you expect a direct Google integration.

ℹ Instruction Scope

Runtime instructions are specific: check NEMO_TOKEN, optionally create ~/.config/nemovideo/client_id, POST to NemoVideo endpoints to get an anonymous token and create sessions, upload video files or URLs, and stream SSEs. These actions are within scope for a service that must upload and process video. Important privacy/security note: the skill explicitly directs uploading user videos to an external third‑party (Nemovideo) — any sensitive content will be transmitted off‑host. The skill instructs not to display raw tokens or API responses, which is sensible.

✓ Install Mechanism

No install spec and no code files — instruction-only skill. This minimizes disk/write risk beyond the explicit config file the skill itself creates (~/.config/nemovideo/client_id).

ℹ Credentials

The declared primary credential is NEMO_TOKEN, which aligns with the backend API usage. The skill will also obtain an anonymous token via an API call if NEMO_TOKEN isn't set. It writes a client_id file under ~/.config/nemovideo/ for rate-limiting; that config path appears only in SKILL.md metadata (the registry summary earlier showed no required config paths) — a small metadata inconsistency. No unrelated credentials (AWS, GitHub, etc.) are requested.

✓ Persistence & Privilege

always:false and normal autonomous invocation. The only persistent change the skill requests is writing a single client_id file in ~./config/nemovideo/ and storing session_id/token for the session’s lifetime — this is proportional for a client that needs to track sessions and rate-limits. The skill does not request system-wide changes or other skills' configuration.

Version History

v1.0.0

Gemini AI Video Generator 1.0.0 — Initial Release - Generate highlight reels, scene descriptions, summaries, captions, and more from your video files using Google's Gemini AI. - Supports mp4, mov, avi, webm, and mkv formats. - Automated first-time connection and authentication process for a seamless setup. - Handles uploads, credit checks, export, and real-time edit requests through the NemoVideo backend. - Includes robust error handling with clear user feedback and session management. - Designed for creators, marketers, and educators to transform videos without manual editing.

Metadata

Slug gemini-ai-video-generator

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Gemini Ai Video Generator?

Drop a video and describe what you want — the gemini-ai-video-generator skill uses Google's Gemini AI to analyze your footage and generate new scenes, summar... It is an AI Agent Skill for Claude Code / OpenClaw, with 119 downloads so far.

How do I install Gemini Ai Video Generator?

Run "/install gemini-ai-video-generator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gemini Ai Video Generator free?

Yes, Gemini Ai Video Generator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gemini Ai Video Generator support?

Gemini Ai Video Generator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gemini Ai Video Generator?

It is built and maintained by peandrover adam (@peand-rover); the current version is v1.0.0.

More Skills

Gemini Ai Video Generator