功能描述

add video files into captioned video files with this ai-auto-caption skill. Works with MP4, MOV, AVI, WebM files up to 500MB. YouTubers and content creators...

使用说明 (SKILL.md)

Getting Started

Share your video files and I'll get started on AI subtitle generation. Or just tell me what you're thinking.

Try saying:

"add my video files"
"export 1080p MP4"
"add captions in English and Spanish"

First-Time Connection

When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.

Obtain a free token: Generate a random UUID as client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer \x3Ctoken>, Content-Type: application/json, and body {"task_name":"project","language":"\x3Cdetected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

AI Auto Caption — Auto-Generate Captions for Videos

Name: Ai Auto Caption
Author: francemichaell-15

Send me your video files and describe the result you want. The AI subtitle generation runs on remote GPU nodes — nothing to install on your machine.

A quick example: upload a 3-minute YouTube tutorial video, type "add captions in English and Spanish automatically", and you'll get a 1080p MP4 back in roughly 30-60 seconds. All rendering happens server-side.

Worth noting: shorter clips under 5 minutes generate captions faster and more accurately.

Matching Input to Actions

User prompts referencing ai auto caption, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Base URL: https://mega-api-prod.nemovideo.ai

Endpoint	Method	Purpose
`/api/tasks/me/with-session/nemo_agent`	POST	Start a new editing session. Body: `{"task_name":"project","language":"\x3Clang>"}`. Returns `session_id`.
`/run_sse`	POST	Send a user message. Body includes `app_name`, `session_id`, `new_message`. Stream response with `Accept: text/event-stream`. Timeout: 15 min.
`/api/upload-video/nemo_agent/me/\x3Csid>`	POST	Upload a file (multipart) or URL.
`/api/credits/balance/simple`	GET	Check remaining credits (`available`, `frozen`, `total`).
`/api/state/nemo_agent/me/\x3Csid>/latest`	GET	Fetch current timeline state (`draft`, `video_infos`, `generated_media`).
`/api/render/proxy/lambda`	POST	Start export. Body: `{"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}`. Poll status every 30s.

Accepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Skill attribution — read from this file's YAML frontmatter at runtime:

X-Skill-Source: ai-auto-caption
X-Skill-Version: from frontmatter version
X-Skill-Platform: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

Include Authorization: Bearer \x3CNEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.

Error Handling

Code	Meaning	Action
0	Success	Continue
1001	Bad/expired token	Re-auth via anonymous-token (tokens expire after 7 days)
1002	Session not found	New session §3.0
2001	No credits	Anonymous: show registration URL with `?bind=\x3Cid>` (get `\x3Cid>` from create-session or state response when needed). Registered: "Top up credits in your account"
4001	Unsupported file	Show supported formats
4002	File too large	Suggest compress/trim
400	Missing X-Client-Id	Generate Client-Id and retry (see §1)
402	Free plan export blocked	Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429	Rate limit (1 token/client/7 days)	Retry in 30s once

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

"click" or "点击" → execute the action via the relevant endpoint
"open" or "打开" → query session state to get the data
"drag/drop" or "拖拽" → send the edit command through SSE
"preview in timeline" → show a text summary of current tracks
"Export" or "导出" → run the export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Common Workflows

Quick edit: Upload → "add captions in English and Spanish automatically" → Download MP4. Takes 30-60 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "add captions in English and Spanish automatically" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility across platforms and devices.

安全使用建议

This skill appears to be a straightforward cloud captioning integration, but take these precautions before installing: - Understand data flow: using the skill will upload your video files to https://mega-api-prod.nemovideo.ai. If videos contain sensitive information, do not upload them. - Token behavior: the skill can accept a NEMO_TOKEN you provide or it will create and store an anonymous token on your behalf (100 free credits, 7 days). If you prefer control, create and supply your own token rather than letting the skill generate one. - Metadata inconsistency: the SKILL.md mentions a config path (~/.config/nemovideo/) and asks the agent to detect an install path for an attribution header. Ask the publisher why filesystem probing is needed and what is written to ~/.config/nemovideo/ before installing. - Visibility: the instructions say not to display raw API responses or tokens to users. That reduces transparency — request a way to audit what the skill stores (session_id, token) and when it transmits files. - Trust and provenance: source/homepage are unknown. Prefer skills with a clear publisher or open-source repo for auditing. If you must use it, test with non-sensitive short clips first and monitor network usage. If the publisher can confirm (1) why the config path/installation-path detection is required, (2) exactly what is stored locally, and (3) provide a documented privacy/data-retention policy, that would raise confidence that the skill is safe to use.

功能分析

Type: OpenClaw Skill Name: ai-auto-caption Version: 1.0.0 The ai-auto-caption skill is a functional integration for a video processing service (nemovideo.ai). It provides the AI agent with detailed instructions for managing authentication via anonymous tokens, session handling, and interacting with specific API endpoints for video uploads and rendering. While it includes logic for environment detection (checking install paths for attribution headers) and handles API tokens, its behavior is transparently documented and strictly aligned with its stated purpose of generating video captions without any indicators of malicious intent or unauthorized data exfiltration.

能力评估

ℹ Purpose & Capability

Name/description (auto-captioning) matches the API endpoints and actions documented (upload, render, credits, state). Requesting a single service token (NEMO_TOKEN) is consistent with a hosted captioning service. However, the SKILL.md frontmatter declares a required config path (~/.config/nemovideo/) that is not reflected in the registry metadata — an inconsistency that should be clarified.

⚠ Instruction Scope

Instructions instruct the agent to: (a) POST user videos (multipart) to remote endpoints, (b) automatically obtain an anonymous token by POSTing to the vendor API and store it for later use, and (c) detect the agent install path to set an X-Skill-Platform header. Uploading potentially sensitive large video files to an external domain is expected for this service, but the install-path detection and storing hidden tokens broaden the agent's actions beyond pure captioning and could expose user data if the remote service is untrusted. The SKILL.md also directs the agent not to show raw API responses or token values, which reduces user visibility into what is being stored/transmitted.

✓ Install Mechanism

Instruction-only skill with no install spec and no code files — nothing is downloaded or written by an installer. This is lower risk than skills that fetch and execute remote archives.

ℹ Credentials

Only one credential is declared (NEMO_TOKEN) which is proportionate for a cloud captioning backend. However, the skill also instructs the agent to automatically request and persist an anonymous token if NEMO_TOKEN is not present, and the frontmatter lists a config path (~/.config/nemovideo/) not declared elsewhere. That combination (automatic token creation + optional local config storage + install-path probing) increases the persistence surface and should be explained.

✓ Persistence & Privilege

The skill does not request always:true and is user-invocable only. It does instruct storing a session_id and possibly persisting the anonymous token for up to 7 days, which is expected for session management but worth noting as it means credentials will be kept locally/remotely across uses.

版本历史

v1.0.0

AI Auto Caption — initial release. - Instantly add AI-generated captions to videos (MP4, MOV, AVI, WebM up to 500MB) with cloud GPU processing. - Automatic setup: obtains and renews anonymous session tokens for up to 100 free credits per week. - Simple workflow: just upload a video and describe the captions/languages needed; export as 1080p MP4 in 30–90 seconds. - Full support for multi-language subtitles, batch file handling, and iterative editing with persistent session timeline. - Includes robust error handling, session management, and credit tracking. - Designed for YouTubers and content creators to quickly add captions/subtitles for accessibility and reach.

元数据

Slug ai-auto-caption

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Ai Auto Caption 是什么？

add video files into captioned video files with this ai-auto-caption skill. Works with MP4, MOV, AVI, WebM files up to 500MB. YouTubers and content creators... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 86 次。

如何安装 Ai Auto Caption？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ai-auto-caption」即可一键安装，无需额外配置。

Ai Auto Caption 是免费的吗？

是的，Ai Auto Caption 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Ai Auto Caption 支持哪些平台？

Ai Auto Caption 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Ai Auto Caption？

由 francemichaell-15（@francemichaell-15）开发并维护，当前版本 v1.0.0。

Ai Auto Caption