← 返回 Skills 市场
peand-rover

Caption Generator Srt

作者 peandrover adam · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
91
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install caption-generator-srt
功能描述
Turn a 10-minute interview recording in MP4 into 1080p captioned videos with SRT just by typing what you need. Whether it's generating SRT caption files for...
使用说明 (SKILL.md)

Getting Started

Share your video files and I'll get started on SRT subtitle generation. Or just tell me what you're thinking.

Try saying:

  • "generate my video files"
  • "export 1080p MP4"
  • "generate an SRT caption file for"

Quick Start Setup

This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

  • Generate a UUID as client identifier
  • POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
  • Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

Caption Generator SRT — Generate SRT Captions from Video

This tool takes your video files and runs SRT subtitle generation through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 10-minute interview recording in MP4 and want to generate an SRT caption file for my video in English — the backend processes it in about 30-90 seconds and hands you a 1080p MP4.

Tip: shorter clips under 5 minutes produce the most accurate captions.

Matching Input to Actions

User prompts referencing caption generator srt, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says... Action Skip SSE?
"export" / "导出" / "download" / "send me the video" → §3.5 Export
"credits" / "积分" / "balance" / "余额" → §3.3 Credits
"status" / "状态" / "show tracks" → §3.4 State
"upload" / "上传" / user sends file → §3.2 Upload
Everything else (generate, edit, add BGM…) → §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

All calls go to https://mega-api-prod.nemovideo.ai. The main endpoints:

  1. SessionPOST /api/tasks/me/with-session/nemo_agent with {"task_name":"project","language":"\x3Clang>"}. Gives you a session_id.
  2. Chat (SSE)POST /run_sse with session_id and your message in new_message.parts[0].text. Set Accept: text/event-stream. Up to 15 min.
  3. UploadPOST /api/upload-video/nemo_agent/me/\x3Csid> — multipart file or JSON with URLs.
  4. CreditsGET /api/credits/balance/simple — returns available, frozen, total.
  5. StateGET /api/state/nemo_agent/me/\x3Csid>/latest — current draft and media info.
  6. ExportPOST /api/render/proxy/lambda with render ID and draft JSON. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s for completed status and download URL.

Formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Headers are derived from this file's YAML frontmatter. X-Skill-Source is caption-generator-srt, X-Skill-Version comes from the version field, and X-Skill-Platform is detected from the install path (~/.clawhub/ = clawhub, ~/.cursor/skills/ = cursor, otherwise unknown).

Include Authorization: Bearer \x3CNEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

  • "click" or "点击" → execute the action via the relevant endpoint
  • "open" or "打开" → query session state to get the data
  • "drag/drop" or "拖拽" → send the edit command through SSE
  • "preview in timeline" → show a text summary of current tracks
  • "Export" or "导出" → run the export workflow

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=\x3Cid>, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "generate an SRT caption file for my video in English" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility across video platforms.

Common Workflows

Quick edit: Upload → "generate an SRT caption file for my video in English" → Download MP4. Takes 30-90 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

安全使用建议
This skill will upload your videos and related metadata to mega-api-prod.nemovideo.ai and requires a NEMO_TOKEN (or it will request an anonymous token from the service). Before installing: (1) confirm you trust the nemovideo.ai endpoint and its privacy policy because your media is transmitted off-device; (2) avoid supplying a high-privilege or unrelated personal token as NEMO_TOKEN — use a token scoped for this service or rely on the anonymous token if appropriate; (3) ask the publisher to clarify the conflicting metadata about config paths and whether the skill reads local install/config paths (it currently indicates it will detect install path and references ~/.config/nemovideo/); (4) because the source/homepage is unknown, prefer using it with non-sensitive/test content until you can verify the vendor and domain. If the author can confirm the configPath usage and the exact header telemetry behavior, and provide a homepage or source repo, that would raise confidence.
功能分析
Type: OpenClaw Skill Name: caption-generator-srt Version: 1.0.0 The skill is a functional wrapper for the NemoVideo AI captioning service, facilitating video uploads and SRT generation via the `mega-api-prod.nemovideo.ai` backend. It includes well-defined instructions for session management, error handling, and token acquisition. The behavior, including environment variable access for its own API token and platform attribution checks, is consistent with its stated purpose and lacks indicators of malicious intent or data exfiltration.
能力评估
Purpose & Capability
The skill claims to run cloud caption/render jobs and requires a NEMO_TOKEN — this aligns with the described nemovideo.ai backend. However the SKILL.md frontmatter declares a config path (~/.config/nemovideo/) while the registry metadata earlier reported no required config paths, which is an internal inconsistency. Detecting install path for X-Skill-Platform headers is also unnecessary for basic captioning and expands the skill's local footprint in a way not described in the high-level description.
Instruction Scope
The instructions are primarily limited to uploading video files and driving the nemovideo.ai API (session creation, SSE chat, upload, export). They will transmit user video/audio and metadata to https://mega-api-prod.nemovideo.ai, which is expected for a cloud-rendering service. Notable behaviors: (1) if NEMO_TOKEN is missing the skill will call the anonymous-token endpoint to self-provision a token — the registry nevertheless lists NEMO_TOKEN as required; (2) the skill instructs detecting the agent's install path and including it in headers (X-Skill-Platform), which implies reading local paths/config to craft telemetry headers. Both are scope expansions the user should be aware of.
Install Mechanism
Instruction-only skill with no install steps or code files; nothing will be downloaded or written to disk by an installer. This is the low-risk install model.
Credentials
The only declared credential is NEMO_TOKEN (primaryEnv), which is appropriate for a cloud API. However, the SKILL.md includes a flow to obtain an anonymous token if NEMO_TOKEN is absent, making the 'required' label misleading. The frontmatter references a config path (~/.config/nemovideo/) which was not listed in the registry metadata — that mismatch should be clarified. Also note: if you provide a real NEMO_TOKEN, it will be used on every request to the external service; ensure that token is scoped appropriately and not a sensitive credential used for other purposes.
Persistence & Privilege
The skill does not request 'always: true' and does not declare writing to other skills' configs or system-wide settings. It retains session_id for the session lifecycle (normal for this service) but does not explicitly instruct persistent local storage of credentials. Autonomous invocation is enabled (default) but is not combined with other high-risk flags.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install caption-generator-srt
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /caption-generator-srt 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Caption Generator SRT 1.0.0 — Initial Release - Easily generate SRT caption files and 1080p captioned videos from your MP4 interview or social video uploads. - No manual editing: upload your video, describe the captions you want, and receive your result in 30–90 seconds. - Integrates automatic cloud backend setup and session management, including free credits for new users. - Supports direct file uploads and a range of video/audio formats (MP4, MOV, AVI, WebM, etc.). - Simple, intent-based commands for generating captions, exporting videos, checking credits, and viewing project state. - Handles errors gracefully, with clear messages for missing tokens, credit issues, and file problems.
元数据
Slug caption-generator-srt
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Caption Generator Srt 是什么?

Turn a 10-minute interview recording in MP4 into 1080p captioned videos with SRT just by typing what you need. Whether it's generating SRT caption files for... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 91 次。

如何安装 Caption Generator Srt?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install caption-generator-srt」即可一键安装,无需额外配置。

Caption Generator Srt 是免费的吗?

是的,Caption Generator Srt 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Caption Generator Srt 支持哪些平台?

Caption Generator Srt 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Caption Generator Srt?

由 peandrover adam(@peand-rover)开发并维护,当前版本 v1.0.0。

💬 留言讨论