← Back to Skills Marketplace
francemichaell-15

Janitor Ai Image To Video

by francemichaell-15 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
75
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install janitor-ai-image-to-video
Description
Skip the learning curve of professional editing software. Describe what you want — animate this character image into a short video clip with motion — and get...
README (SKILL.md)

Getting Started

Got static images to work with? Send it over and tell me what you need — I'll take care of the AI video creation.

Try saying:

  • "convert a single character illustration or scene image into a 1080p MP4"
  • "animate this character image into a short video clip with motion"
  • "converting AI-generated character images into short animated videos for AI art creators"

Getting Connected

Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".

If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:

  • Generate a UUID as client identifier
  • POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header
  • The response includes a token with 100 free credits valid for 7 days — use it as NEMO_TOKEN

Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.

Tell the user you're ready. Keep the technical details out of the chat.

Janitor AI Image to Video — Convert Images Into Video Clips

Send me your static images and describe the result you want. The AI video creation runs on remote GPU nodes — nothing to install on your machine.

A quick example: upload a single character illustration or scene image, type "animate this character image into a short video clip with motion", and you'll get a 1080p MP4 back in roughly 30-90 seconds. All rendering happens server-side.

Worth noting: clean images with clear subjects animate more smoothly than cluttered backgrounds.

Matching Input to Actions

User prompts referencing janitor ai image to video, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says... Action Skip SSE?
"export" / "导出" / "download" / "send me the video" → §3.5 Export
"credits" / "积分" / "balance" / "余额" → §3.3 Credits
"status" / "状态" / "show tracks" → §3.4 State
"upload" / "上传" / user sends file → §3.2 Upload
Everything else (generate, edit, add BGM…) → §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Include Authorization: Bearer \x3CNEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.

Headers are derived from this file's YAML frontmatter. X-Skill-Source is janitor-ai-image-to-video, X-Skill-Version comes from the version field, and X-Skill-Platform is detected from the install path (~/.clawhub/ = clawhub, ~/.cursor/skills/ = cursor, otherwise unknown).

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=\x3Cid>, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

  • "click" or "点击" → execute the action via the relevant endpoint
  • "open" or "打开" → query session state to get the data
  • "drag/drop" or "拖拽" → send the edit command through SSE
  • "preview in timeline" → show a text summary of current tracks
  • "Export" or "导出" → run the export workflow

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "animate this character image into a short video clip with motion" — concrete instructions get better results.

Max file size is 200MB. Stick to JPG, PNG, WEBP, GIF for the smoothest experience.

Export as MP4 for widest compatibility.

Common Workflows

Quick edit: Upload → "animate this character image into a short video clip with motion" → Download MP4. Takes 30-90 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Usage Guidance
This skill appears to do what it says: upload images to a remote API and return rendered video URLs. Before installing or using it, consider: (1) only supply a NEMO_TOKEN if you trust the nemovideo.ai service; any token in your environment could be used to run export/upload requests. (2) Avoid uploading private or sensitive images you wouldn't want sent to a third-party GPU render service. (3) The skill will generate an anonymous token and make network calls if NEMO_TOKEN is absent — expect traffic to mega-api-prod.nemovideo.ai. (4) The metadata mentions a config path (~/.config/nemovideo/) though the runtime steps don't require it; that is minor but you may want to verify the skill won't access other local config files before granting broader file access. If you need stronger assurance, ask the publisher for source code or an official homepage and confirm the API host is legitimate.
Capability Analysis
Type: OpenClaw Skill Name: janitor-ai-image-to-video Version: 1.0.0 The skill bundle provides a functional interface for converting images to videos via the nemovideo.ai API. It contains detailed instructions for the AI agent to manage sessions, handle file uploads, and process server-sent events (SSE) for video generation. The code and instructions (SKILL.md) are consistent with the stated purpose, and there are no indicators of data exfiltration, malicious execution, or unauthorized access to sensitive system files.
Capability Assessment
Purpose & Capability
The name/description (image→video rendering) aligns with the required env var (NEMO_TOKEN) and the documented API endpoints on mega-api-prod.nemovideo.ai. The metadata's configPaths entry (~/.config/nemovideo/) is present but not used in the SKILL.md instructions — a minor inconsistency but not disproportional to the stated purpose.
Instruction Scope
SKILL.md gives explicit API calls for session creation, SSE-based generation, uploads, status, credits, and export. These network calls and local file uploads (multipart file posts) are expected for the stated service. Note: the instructions will cause the agent to read user-supplied local files for upload and to generate/retain a short-lived session token; this is normal for an upload/remote-render flow but worth user attention.
Install Mechanism
No install spec and no code files — instruction-only. That minimizes on-disk risk; nothing is fetched or executed by an installer.
Credentials
Only a single credential is required (NEMO_TOKEN) and the SKILL.md provides a clear anonymous-token fallback flow if the env var is absent. There are no unrelated credentials or broad system secrets requested.
Persistence & Privilege
The skill does not request 'always: true' or other elevated persistent privileges, nor does it instruct modification of other skills or global agent settings. Session tokens are used for the service and are ephemeral per the instructions.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install janitor-ai-image-to-video
  3. After installation, invoke the skill by name or use /janitor-ai-image-to-video
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Janitor AI Image to Video v1.0.0 — Initial Release - Instantly transforms static images (JPG, PNG, WEBP, GIF up to 200MB) into animated video clips using AI, with no editing software required. - Supports simple, conversational instructions to generate and edit video clips with motion, text overlays, and audio tracks. - Cloud-based rendering pipeline delivers 1080p MP4 videos in about 30–90 seconds; all processing is remote with no local installation needed. - Automatic session and token management, with a free starter quota for new users. - Provides real-time status updates, credits checking, and easy export/download of finished videos.
Metadata
Slug janitor-ai-image-to-video
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Janitor Ai Image To Video?

Skip the learning curve of professional editing software. Describe what you want — animate this character image into a short video clip with motion — and get... It is an AI Agent Skill for Claude Code / OpenClaw, with 75 downloads so far.

How do I install Janitor Ai Image To Video?

Run "/install janitor-ai-image-to-video" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Janitor Ai Image To Video free?

Yes, Janitor Ai Image To Video is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Janitor Ai Image To Video support?

Janitor Ai Image To Video is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Janitor Ai Image To Video?

It is built and maintained by francemichaell-15 (@francemichaell-15); the current version is v1.0.0.

💬 Comments