Description

Skip the learning curve of professional editing software. Describe what you want — add an AI voiceover narrating each slide in English — and get narrated vid...

README (SKILL.md)

Getting Started

Send me your images or slides and I'll handle the AI voiceover generation. Or just describe what you're after.

Try saying:

"add a Canva-exported slide deck as MP4 or images into a 1080p MP4"
"add an AI voiceover narrating each slide in English"
"adding AI voiceover to Canva presentations or designs for marketers, educators, content creators"

Quick Start Setup

This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

Generate a UUID as client identifier
POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

AI Voiceover for Canva — Add AI Voice to Videos

Name: Ai Voiceover Canva
Author: dsewell-583h0

Send me your images or slides and describe the result you want. The AI voiceover generation runs on remote GPU nodes — nothing to install on your machine.

A quick example: upload a Canva-exported slide deck as MP4 or images, type "add an AI voiceover narrating each slide in English", and you'll get a 1080p MP4 back in roughly 30-60 seconds. All rendering happens server-side.

Worth noting: export your Canva design as MP4 first, then upload for the cleanest voiceover sync.

Matching Input to Actions

User prompts referencing ai voiceover canva, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

All requests must include: Authorization: Bearer \x3CNEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

Three attribution headers are required on every request and must match this file's frontmatter:

Header	Value
`X-Skill-Source`	`ai-voiceover-canva`
`X-Skill-Version`	frontmatter `version`
`X-Skill-Platform`	auto-detect: `clawhub` / `cursor` / `unknown` from install path

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Error Codes

0 — success, continue normally
1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
1002 — session not found; create a new one
2001 — out of credits; anonymous users get a registration link with ?bind=\x3Cid>, registered users top up
4001 — unsupported file type; show accepted formats
4002 — file too large; suggest compressing or trimming
400 — missing X-Client-Id; generate one and retry
402 — free plan export blocked; not a credit issue, subscription tier
429 — rate limited; wait 30s and retry once

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Query session state
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute export workflow

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "add an AI voiceover narrating each slide in English" — concrete instructions get better results.

Max file size is 200MB. Stick to MP4, PNG, JPG, PDF for the smoothest experience.

Export your Canva project as MP4 before uploading for best audio-video alignment.

Common Workflows

Quick edit: Upload → "add an AI voiceover narrating each slide in English" → Download MP4. Takes 30-60 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Usage Guidance

Things to consider before installing or using this skill: - Data privacy: the skill uploads any media you give it to https://mega-api-prod.nemovideo.ai (a domain with no homepage in the registry entry). Do not upload sensitive or private material unless you trust that endpoint and its privacy policy. - Credentials: the registry declares NEMO_TOKEN as required, but the runtime instructions will obtain an anonymous token if none is present. Decide whether you want the skill to use an account token you control (safer) or let it mint ephemeral anonymous tokens. - Config access inconsistency: SKILL.md frontmatter references a local config path (~/.config/nemovideo/) but registry metadata did not list config paths — ask the publisher whether the skill will read/write local config files. - Verify provenance: this skill has no homepage and an unknown source owner. If you plan to use it regularly or with valuable content, ask for documentation, a project homepage, or a privacy/terms link and verify the domain and operator. - If you need stronger guarantees: prefer a skill from a verifiable publisher, or use a local/offline tool so your files never leave your environment. If you proceed, consider providing your own NEMO_TOKEN tied to an account with limited scope, and avoid uploading sensitive material until you confirm policies and ownership.

Capability Analysis

Type: OpenClaw Skill Name: ai-voiceover-canva Version: 1.0.0 The skill provides a functional integration for the NemoVideo AI service, allowing an AI agent to automate voiceover generation for Canva videos. It outlines standard API interactions with 'mega-api-prod.nemovideo.ai', including session management, file uploads, and polling for render results. The instructions include security-conscious directives for the agent, such as not exposing tokens, and the requested permissions (NEMO_TOKEN and ~/.config/nemovideo/) are consistent with the skill's stated purpose.

Capability Assessment

ℹ Purpose & Capability

The declared purpose (add AI voiceovers to uploaded videos/images) matches the runtime instructions which call a remote rendering API and accept uploads. That functionality justifies network calls, uploads, and session tokens.

⚠ Instruction Scope

SKILL.md instructs the agent to look for NEMO_TOKEN, create an anonymous token via https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token if missing, keep a session_id, and upload user files (multipart or URL). Uploading user media and returning URLs is expected, but the instructions also reference a config path in the skill frontmatter (~/.config/nemovideo/) and require building attribution headers and detecting the install path for X-Skill-Platform — these items expand scope beyond the bare minimum and conflict with registry metadata.

✓ Install Mechanism

This is an instruction-only skill with no install spec or code files, so nothing is written to disk by an installer. That lowers install-time risk.

⚠ Credentials

The skill declares a single required env var (NEMO_TOKEN / primaryEnv) which is appropriate for a remote API. However, SKILL.md explicitly describes generating an anonymous token if NEMO_TOKEN is absent, so marking NEMO_TOKEN as required is inconsistent. The frontmatter also mentions a config path (~/.config/nemovideo/) which suggests the agent might read or expect local config — this is not reflected in the registry metadata. These mismatches make it unclear whether the skill needs pre-provisioned credentials or will create/store tokens itself.

✓ Persistence & Privilege

always is false and there is no install script or config modification described. The skill can be invoked autonomously by the agent (normal for skills), but it does not request elevated persistent presence.

Version History

v1.0.0

ai-voiceover-canva 1.0.0 — Initial Release - Instantly add English AI voiceover narration to Canva videos, slides, or images—no editing skills needed. - Handles MP4, PNG, JPG, PDF uploads up to 200MB and generates narrated 1080p MP4 videos in 30-60 seconds. - Automatic cloud setup: connects seamlessly and manages tokens for free trial usage. - Easily check credits, upload files, track project status, and export narrated videos via simple prompts. - Designed for marketers, educators, and creators to quickly turn silent Canva content into narrated videos, all cloud-rendered, no install required.

Metadata

Slug ai-voiceover-canva

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Ai Voiceover Canva?

Skip the learning curve of professional editing software. Describe what you want — add an AI voiceover narrating each slide in English — and get narrated vid... It is an AI Agent Skill for Claude Code / OpenClaw, with 88 downloads so far.

How do I install Ai Voiceover Canva?

Run "/install ai-voiceover-canva" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ai Voiceover Canva free?

Yes, Ai Voiceover Canva is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Ai Voiceover Canva support?

Ai Voiceover Canva is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ai Voiceover Canva?

It is built and maintained by dsewell-583h0 (@dsewell-583h0); the current version is v1.0.0.

More Skills

Ai Voiceover Canva