← Back to Skills Marketplace
francemichaell-15

Best Auto Caption

by francemichaell-15 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
98
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install best-auto-caption
Description
YouTubers, TikTok creators, content marketers add video clips into captioned video files using this skill. Accepts MP4, MOV, AVI, WebM up to 500MB, renders o...
README (SKILL.md)

Getting Started

Share your video clips and I'll get started on automatic subtitle generation. Or just tell me what you're thinking.

Try saying:

  • "add my video clips"
  • "export 1080p MP4"
  • "auto-generate captions in English and sync"

Getting Connected

Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".

If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:

  • Generate a UUID as client identifier
  • POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header
  • The response includes a token with 100 free credits valid for 7 days — use it as NEMO_TOKEN

Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.

Tell the user you're ready. Keep the technical details out of the chat.

Best Auto Caption — Auto-Generate Synced Video Captions

Send me your video clips and describe the result you want. The automatic subtitle generation runs on remote GPU nodes — nothing to install on your machine.

A quick example: upload a 3-minute YouTube tutorial recording, type "auto-generate captions in English and sync them to the video", and you'll get a 1080p MP4 back in roughly 30-60 seconds. All rendering happens server-side.

Worth noting: clear audio with minimal background noise produces the most accurate auto-captions.

Matching Input to Actions

User prompts referencing best auto caption, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says... Action Skip SSE?
"export" / "导出" / "download" / "send me the video" → §3.5 Export
"credits" / "积分" / "balance" / "余额" → §3.3 Credits
"status" / "状态" / "show tracks" → §3.4 State
"upload" / "上传" / user sends file → §3.2 Upload
Everything else (generate, edit, add BGM…) → §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Three attribution headers are required on every request and must match this file's frontmatter:

Header Value
X-Skill-Source best-auto-caption
X-Skill-Version frontmatter version
X-Skill-Platform auto-detect: clawhub / cursor / unknown from install path

All requests must include: Authorization: Bearer \x3CNEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

  • "click" or "点击" → execute the action via the relevant endpoint
  • "open" or "打开" → query session state to get the data
  • "drag/drop" or "拖拽" → send the edit command through SSE
  • "preview in timeline" → show a text summary of current tracks
  • "Export" or "导出" → run the export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=\x3Cid>, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "auto-generate captions in English and sync them to the video" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility across social platforms.

Common Workflows

Quick edit: Upload → "auto-generate captions in English and sync them to the video" → Download MP4. Takes 30-60 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Usage Guidance
This skill uploads your videos and metadata to mega-api-prod.nemovideo.ai and uses a NEMO_TOKEN for authorization. Before installing, confirm you trust that external service and are comfortable sending your media to it. Prefer supplying your own NEMO_TOKEN (if one exists) rather than relying on the skill's anonymous-token flow if you need tighter control. Note the small inconsistency in the frontmatter (a referenced ~/.config/nemovideo/ path) — ask the publisher to clarify whether any local config files will be read. Finally, check the service's privacy/retention policy (are generated download URLs public? how long are files kept?) and whether credits or billing apply for larger jobs.
Capability Analysis
Type: OpenClaw Skill Name: best-auto-caption Version: 1.0.0 The skill is a functional integration for a cloud-based video captioning service hosted at nemovideo.ai. It follows standard API patterns for authentication, session management, and file processing, including the use of SSE for real-time updates and multipart uploads for video files. While it requests access to a specific configuration path (~/.config/nemovideo/) and environment variables, these actions are consistent with its stated purpose, and there is no evidence of data exfiltration, malicious execution, or harmful prompt injection.
Capability Assessment
Purpose & Capability
The skill is an auto-caption/render service and requires a NEMO_TOKEN and API interactions with mega-api-prod.nemovideo.ai, which matches the stated purpose of remote GPU rendering and captioning.
Instruction Scope
SKILL.md instructs only to create a session, upload user-supplied video files, stream SSE events, poll render status, and download results — all consistent with the service. One small ambiguity: it asks to auto-detect an install path for the X-Skill-Platform header ("clawhub"/"cursor"/"unknown"), which implies reading runtime/install metadata; this is plausible but underspecified. Otherwise there are no instructions to read unrelated files or secrets.
Install Mechanism
No install step or external downloads — instruction-only skill (lowest install risk).
Credentials
Only NEMO_TOKEN is declared as required, which is proportionate for a third‑party captioning API. Minor inconsistency: the frontmatter lists a configPaths entry (~/.config/nemovideo/) while the registry metadata showed no required config paths — this mismatch should be clarified but does not by itself indicate malicious intent.
Persistence & Privilege
always is false and there is no installation of persistent agents or modification of other skills. The skill can be invoked autonomously by the agent (default) which is normal for skills.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install best-auto-caption
  3. After installation, invoke the skill by name or use /best-auto-caption
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Version 1.0.0 — Initial Release - Introduces automatic, cloud-based video captioning for creators (YouTube, TikTok, etc.), supporting MP4, MOV, AVI, and WebM up to 500MB. - Simple workflow: upload video, auto-generate synced captions, and export 1080p MP4 files in under a minute. - Handles authentication using a simple token system, with free credits available for new users. - Maps user prompts to actions like upload, export, balance check, and status monitoring. - Includes robust error handling for authentication, file validation, and rate limits. - Optimized for clear user instructions and adaptable to iterative, batch, or quick-edit workflows.
Metadata
Slug best-auto-caption
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Best Auto Caption?

YouTubers, TikTok creators, content marketers add video clips into captioned video files using this skill. Accepts MP4, MOV, AVI, WebM up to 500MB, renders o... It is an AI Agent Skill for Claude Code / OpenClaw, with 98 downloads so far.

How do I install Best Auto Caption?

Run "/install best-auto-caption" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Best Auto Caption free?

Yes, Best Auto Caption is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Best Auto Caption support?

Best Auto Caption is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Best Auto Caption?

It is built and maintained by francemichaell-15 (@francemichaell-15); the current version is v1.0.0.

💬 Comments