← Back to Skills Marketplace
vcarolxhberger

Ai Subtitle Extractor

by vcarolxhberger · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
82
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install ai-subtitle-extractor
Description
Turn a 10-minute YouTube tutorial video into 1080p captioned video files just by typing what you need. Whether it's extracting and embedding subtitles from e...
README (SKILL.md)

Getting Started

Share your video files and I'll get started on AI subtitle extraction. Or just tell me what you're thinking.

Try saying:

  • "extract my video files"
  • "export 1080p MP4"
  • "extract subtitles from this video and"

Quick Start Setup

This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

  • Generate a UUID as client identifier
  • POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
  • Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

AI Subtitle Extractor — Extract and Embed Video Subtitles

This tool takes your video files and runs AI subtitle extraction through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 10-minute YouTube tutorial video and want to extract subtitles from this video and export as SRT — the backend processes it in about 30-90 seconds and hands you a 1080p MP4.

Tip: cleaner audio with less background noise produces more accurate subtitle extraction.

Matching Input to Actions

User prompts referencing ai subtitle extractor, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says... Action Skip SSE?
"export" / "导出" / "download" / "send me the video" → §3.5 Export
"credits" / "积分" / "balance" / "余额" → §3.3 Credits
"status" / "状态" / "show tracks" → §3.4 State
"upload" / "上传" / user sends file → §3.2 Upload
Everything else (generate, edit, add BGM…) → §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Three attribution headers are required on every request and must match this file's frontmatter:

Header Value
X-Skill-Source ai-subtitle-extractor
X-Skill-Version frontmatter version
X-Skill-Platform auto-detect: clawhub / cursor / unknown from install path

Include Authorization: Bearer \x3CNEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

  • "click" or "点击" → execute the action via the relevant endpoint
  • "open" or "打开" → query session state to get the data
  • "drag/drop" or "拖拽" → send the edit command through SSE
  • "preview in timeline" → show a text summary of current tracks
  • "Export" or "导出" → run the export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=\x3Cid>, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

Common Workflows

Quick edit: Upload → "extract subtitles from this video and export as SRT" → Download MP4. Takes 30-90 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "extract subtitles from this video and export as SRT" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility.

Usage Guidance
This skill will upload your video files and related session metadata to the remote domain mega-api-prod.nemovideo.ai and will use or obtain a NEMO_TOKEN (it can fetch an anonymous token if you don't provide one). Before installing or using: 1) Confirm you trust the remote service—check its privacy/data-retention policy and who runs the service; 2) If your videos contain sensitive content, avoid uploading them or provide a vetted, self-managed processing option; 3) Clarify how/where tokens and session IDs are stored (in memory only vs written under ~/.config/nemovideo/); 4) Consider supplying your own NEMO_TOKEN only if you trust the provider; 5) Because the skill source is unknown and the registry metadata has a small inconsistency (config path present in SKILL.md but not in registry), proceed cautiously and ask the skill author to explain the discrepancy and the service's data handling practices.
Capability Analysis
Type: OpenClaw Skill Name: ai-subtitle-extractor Version: 1.0.0 The skill is a standard integration for a cloud-based video processing service (nemovideo.ai). It provides instructions for the AI agent to manage sessions, upload media, and trigger rendering tasks via a specific API (mega-api-prod.nemovideo.ai). The behavior, including the handling of the NEMO_TOKEN and session state, is consistent with the stated purpose of AI subtitle extraction and does not exhibit signs of data exfiltration, malicious execution, or harmful prompt injection.
Capability Assessment
Purpose & Capability
The name/description align with the instructions: the skill routes uploads and render jobs to a cloud rendering backend (mega-api-prod.nemovideo.ai) and requires a NEMO_TOKEN. However the SKILL.md frontmatter lists a config path (~/.config/nemovideo/) while the registry metadata reported no required config paths — this metadata mismatch is inconsistent and worth clarifying.
Instruction Scope
The instructions explicitly upload user video files (multipart file POSTs or URL uploads) and stream SSE responses from the remote API. Uploading user files to an external service is expected for this functionality, but it is sensitive: the skill will transmit full media content and session metadata to mega-api-prod.nemovideo.ai and will obtain or reuse tokens. The SKILL.md also instructs the agent to auto-create an anonymous token if NEMO_TOKEN is missing, which involves contacting the API and storing/using the returned token for subsequent operations. There is no instruction about where (or whether) the anonymous token or session_id is stored locally, which is a scope/privacy concern.
Install Mechanism
Instruction-only skill with no install spec and no code files. This lowers filesystem/installation risk because nothing is downloaded or written by an installer, but runtime network calls to the external API remain the primary risk surface.
Credentials
Only one environment credential is declared (NEMO_TOKEN), which is proportional to a cloud-rendering service. The SKILL.md also documents acquiring an anonymous token when none is present. The frontmatter's inclusion of a config path (~/.config/nemovideo/) is not reflected in the registry metadata and should be clarified because access to config paths could imply additional local state access.
Persistence & Privilege
The skill is not marked always:true and uses normal model-invocation defaults. It does not request elevated platform privileges in the instructions. The main persistence-related behavior is retaining a session_id/token for ongoing operations; the SKILL.md does not say whether tokens/session IDs are persisted to disk.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ai-subtitle-extractor
  3. After installation, invoke the skill by name or use /ai-subtitle-extractor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of AI Subtitle Extractor — extract and embed subtitles in videos via cloud rendering. - Upload video files and generate subtitle-embedded videos or SRT/VTT files in 30–90 seconds. - No timeline editing or export settings required — just describe your desired outcome. - Automatic token/session setup with 100 free credits for new users. - Supports common workflows: quick edits, batch processing, iterative refinement. - Handles videos up to 500MB; outputs MP4, MOV, AVI, and more.
Metadata
Slug ai-subtitle-extractor
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Ai Subtitle Extractor?

Turn a 10-minute YouTube tutorial video into 1080p captioned video files just by typing what you need. Whether it's extracting and embedding subtitles from e... It is an AI Agent Skill for Claude Code / OpenClaw, with 82 downloads so far.

How do I install Ai Subtitle Extractor?

Run "/install ai-subtitle-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ai Subtitle Extractor free?

Yes, Ai Subtitle Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Ai Subtitle Extractor support?

Ai Subtitle Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ai Subtitle Extractor?

It is built and maintained by vcarolxhberger (@vcarolxhberger); the current version is v1.0.0.

💬 Comments