← Back to Skills Marketplace
vcarolxhberger

Free Caption

by vcarolxhberger · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
57
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install free-caption
Description
add video files into captioned video files with this skill. Works with MP4, MOV, AVI, WebM files up to 500MB. content creators use it for adding free automat...
README (SKILL.md)

Getting Started

Send me your video files and I'll handle the automatic caption generation. Or just describe what you're after.

Try saying:

  • "add a 3-minute tutorial video recording into a 1080p MP4"
  • "add free captions to my video automatically"
  • "adding free automatic captions to videos for content creators"

Automatic Setup

On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: \x3Cuuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).

Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

Free Caption — Auto-Generate Captions for Videos

This tool takes your video files and runs automatic caption generation through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 3-minute tutorial video recording and want to add free captions to my video automatically — the backend processes it in about 30-60 seconds and hands you a 1080p MP4.

Tip: shorter clips under 2 minutes generate captions fastest.

Matching Input to Actions

User prompts referencing free caption, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says... Action Skip SSE?
"export" / "导出" / "download" / "send me the video" → §3.5 Export
"credits" / "积分" / "balance" / "余额" → §3.3 Credits
"status" / "状态" / "show tracks" → §3.4 State
"upload" / "上传" / user sends file → §3.2 Upload
Everything else (generate, edit, add BGM…) → §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

All calls go to https://mega-api-prod.nemovideo.ai. The main endpoints:

  1. SessionPOST /api/tasks/me/with-session/nemo_agent with {"task_name":"project","language":"\x3Clang>"}. Gives you a session_id.
  2. Chat (SSE)POST /run_sse with session_id and your message in new_message.parts[0].text. Set Accept: text/event-stream. Up to 15 min.
  3. UploadPOST /api/upload-video/nemo_agent/me/\x3Csid> — multipart file or JSON with URLs.
  4. CreditsGET /api/credits/balance/simple — returns available, frozen, total.
  5. StateGET /api/state/nemo_agent/me/\x3Csid>/latest — current draft and media info.
  6. ExportPOST /api/render/proxy/lambda with render ID and draft JSON. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s for completed status and download URL.

Formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Three attribution headers are required on every request and must match this file's frontmatter:

Header Value
X-Skill-Source free-caption
X-Skill-Version frontmatter version
X-Skill-Platform auto-detect: clawhub / cursor / unknown from install path

All requests must include: Authorization: Bearer \x3CNEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says You do
"click [button]" / "点击" Execute via API
"open [panel]" / "打开" Query session state
"drag/drop" / "拖拽" Send edit via SSE
"preview in timeline" Show track summary
"Export button" / "导出" Execute export workflow

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=\x3Cid>, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

Common Workflows

Quick edit: Upload → "add free captions to my video automatically" → Download MP4. Takes 30-60 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "add free captions to my video automatically" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility.

Usage Guidance
Before installing, understand that this skill is cloud-based: it can create a NemoVideo session, use or generate a NEMO_TOKEN, upload selected media, and export rendered videos. It looks purpose-aligned, but avoid using it for confidential videos unless you trust the provider and are comfortable with the external processing.
Capability Analysis
Type: OpenClaw Skill Name: free-caption Version: 1.0.0 The skill is a functional wrapper for a third-party video processing service (nemovideo.ai). It provides detailed instructions for an AI agent to manage authentication, session state, file uploads, and polling for video rendering tasks. While it involves uploading user media to a remote endpoint (mega-api-prod.nemovideo.ai) and requires an API token (NEMO_TOKEN), these behaviors are transparently documented and directly aligned with the stated purpose of providing cloud-based video captioning.
Capability Assessment
Purpose & Capability
The stated purpose is coherent with the API workflow: SKILL.md says it will run automatic caption generation through a cloud rendering pipeline. Users should note it also describes adjacent video-editing actions such as aspect ratio, text overlays, audio tracks, and export.
Instruction Scope
SKILL.md instructs the agent to connect to the processing API on first interaction and to translate backend GUI-style responses into API actions. This is purpose-aligned but means the skill may perform service-side workflow steps without showing every raw API action.
Install Mechanism
There is no install spec and no code files; this is instruction-only. The review therefore rests on SKILL.md behavior, registry metadata, and the disclosed API endpoints rather than inspectable helper code.
Credentials
Uploading user-provided videos to https://mega-api-prod.nemovideo.ai is proportionate for cloud caption generation, but users should treat uploaded videos as shared with that external provider.
Persistence & Privilege
The skill uses or creates a NEMO_TOKEN and saves a session_id for the service workflow. Frontmatter also lists ~/.config/nemovideo/ as a config path, although the supplied instructions do not show concrete local reads or writes.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install free-caption
  3. After installation, invoke the skill by name or use /free-caption
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of Free Caption — Auto-Generate Captions for Videos. - Add automatic video captioning for MP4, MOV, AVI, and WebM files up to 500MB - Supports seamless 30–90 second cloud processing; outputs 1080p MP4 videos - Handles quick edits, multi-file batch processing, and iterative workflows with session state tracking - Simple onboarding: auto-generate or use provided token for API access - Provides clear user feedback, error handling, and status updates throughout the process
Metadata
Slug free-caption
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Free Caption?

add video files into captioned video files with this skill. Works with MP4, MOV, AVI, WebM files up to 500MB. content creators use it for adding free automat... It is an AI Agent Skill for Claude Code / OpenClaw, with 57 downloads so far.

How do I install Free Caption?

Run "/install free-caption" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Free Caption free?

Yes, Free Caption is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Free Caption support?

Free Caption is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Free Caption?

It is built and maintained by vcarolxhberger (@vcarolxhberger); the current version is v1.0.0.

💬 Comments