← 返回 Skills 市场
peand-rover

Audio To Subtitle Generator

作者 peandrover adam · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
140
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install audio-to-subtitle-generator
功能描述
Tell me what you need and I'll turn your spoken audio into clean, time-synced subtitles in minutes. This audio-to-subtitle-generator skill transcribes dialog...
使用说明 (SKILL.md)

Getting Started

Welcome! I'm here to help you generate accurate, time-synced subtitles from your video's audio track. Upload your video file and tell me your preferred subtitle format or any specific requirements — let's get your captions ready to go!

Try saying:

  • "Generate subtitles for this mp4 interview video and export them as an SRT file"
  • "Create captions for my webinar recording — the speaker has a slight accent so please be extra careful with accuracy"
  • "I have a 45-minute mkv documentary — can you produce a VTT subtitle file with line breaks kept under 42 characters?"

Quick Start Setup

This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

  • Read ~/.config/nemovideo/client_id or generate a UUID and save it there
  • POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
  • Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

Turn Every Word Spoken Into Readable, Synced Subtitles

Whether you're publishing a YouTube tutorial, captioning a corporate training video, or making a documentary accessible to deaf and hard-of-hearing audiences, getting subtitles right matters. This skill listens to the audio in your video file and converts every spoken word into a properly timed subtitle file — no manual typing, no tedious timestamp adjustments, and no expensive transcription services required.

The audio-to-subtitle-generator works by analyzing the speech track in your uploaded video, segmenting it into readable lines, and attaching precise start and end timestamps to each segment. The result is a subtitle file you can drop directly into your video editor, upload to YouTube or Vimeo, or embed into your website player.

This is especially valuable for multilingual teams, solo creators working at scale, or anyone who needs to repurpose recorded content across multiple formats. Instead of spending hours scrubbing through a timeline, you get a complete subtitle draft in a fraction of the time — ready to review, edit if needed, and publish with confidence.

Routing Your Transcription Requests

Each subtitle generation request is parsed for audio source, language preference, and caption format, then routed to the appropriate transcription pipeline automatically.

User says... Action Skip SSE?
"export" / "导出" / "download" / "send me the video" → §3.5 Export
"credits" / "积分" / "balance" / "余额" → §3.3 Credits
"status" / "状态" / "show tracks" → §3.4 State
"upload" / "上传" / user sends file → §3.2 Upload
Everything else (generate, edit, add BGM…) → §3.1 SSE

NemoVideo API Reference

The NemoVideo backend handles speech-to-text processing by analyzing audio waveforms, detecting speaker segments, and outputting time-coded subtitle tracks in SRT, VTT, or plain text formats. Requests are authenticated via bearer token and processed asynchronously, with subtitle files returned once the transcription job completes.

Skill attribution — read from this file's YAML frontmatter at runtime:

  • X-Skill-Source: audio-to-subtitle-generator
  • X-Skill-Version: from frontmatter version
  • X-Skill-Platform: detect from install path (~/.clawhub/clawhub, ~/.cursor/skills/cursor, else unknown)

All requests must include: Authorization: Bearer \x3CNEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"\x3Clang>"} — returns task_id, session_id. After creating a session, give the user a link: https://nemovideo.com/workspace/claim?token=&task=\x3Ctask_id>&session=\x3Csession_id>&skill_name=audio-to-subtitle-generator&skill_version=1.0.0&skill_source=\x3Cplatform>

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/\x3Csid> — file: multipart -F "files=@/path", or URL: {"urls":["\x3Curl>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/\x3Csid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/\x3Cid> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event Action
Text response Apply GUI translation (§4), present to user
Tool call/result Process internally, don't forward
heartbeat / empty data: Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says You do
"click [button]" / "点击" Execute via API
"open [panel]" / "打开" Query session state
"drag/drop" / "拖拽" Send edit via SSE
"preview in timeline" Show track summary
"Export button" / "导出" Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

Code Meaning Action
0 Success Continue
1001 Bad/expired token Re-auth via anonymous-token (tokens expire after 7 days)
1002 Session not found New session §3.0
2001 No credits Anonymous: show registration URL with ?bind=\x3Cid> (get \x3Cid> from create-session or state response when needed). Registered: "Top up at nemovideo.ai"
4001 Unsupported file Show supported formats
4002 File too large Suggest compress/trim
400 Missing X-Client-Id Generate Client-Id and retry (see §1)
402 Free plan export blocked Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export."
429 Rate limit (1 token/client/7 days) Retry in 30s once

Best Practices

For the most accurate subtitle output, start with the cleanest audio possible. Videos with minimal background noise, consistent microphone placement, and clear speech will produce subtitles that need little to no manual correction after generation.

If your video features technical jargon, brand names, or industry-specific terminology, mention key terms upfront so they can be handled with greater care during transcription. This is particularly useful for medical, legal, or technology-focused content where a misheard word can change meaning significantly.

Keep subtitle line lengths readable — aim for no more than two lines on screen at a time and avoid breaking sentences mid-thought when possible. When reviewing your generated subtitles, pay special attention to speaker transitions and moments with overlapping dialogue, as these are the most common areas where timing may need a small manual nudge before publishing.

Quick Start Guide

Getting started with the audio-to-subtitle-generator is straightforward. Begin by uploading your video file in one of the supported formats: mp4, mov, avi, webm, or mkv. Once uploaded, specify your preferred output format — SRT is the most universally compatible, while VTT works best for web-based players and HTML5 video.

If your video contains multiple speakers, mention that upfront so subtitles can be segmented clearly between voices. You can also specify a maximum characters-per-line limit if your platform has display constraints — 42 characters per line is a common broadcast standard.

Once processing is complete, you'll receive your subtitle file ready for download. You can import it directly into Adobe Premiere Pro, DaVinci Resolve, Final Cut Pro, or upload it alongside your video on YouTube, Vimeo, or any streaming platform that accepts external caption files.

Use Cases

The audio-to-subtitle-generator serves a wide range of real-world workflows. Content creators on YouTube and TikTok use it to add captions that boost watch time and reach viewers who watch without sound — a habit that now represents over 85% of mobile video consumption.

Educators and e-learning developers rely on it to make course videos ADA and WCAG compliant, ensuring students with hearing impairments have full access to lecture content. Legal and medical professionals use it to transcribe recorded depositions, patient consultations, or training sessions where accuracy and timestamping are critical for documentation.

Journalists and podcast producers convert recorded interviews into subtitle files that double as searchable transcripts. Corporate communications teams use it to caption internal town halls, product demos, and onboarding videos — making content reusable across global teams regardless of language or hearing ability.

安全使用建议
This skill appears to be what it claims: it uploads your video to NemoVideo's cloud API, creates/uses an API token (NEMO_TOKEN) and may save a client_id/token under ~/.config/nemovideo/. Before installing or using it, consider: 1) Privacy — your audio/video (and resulting transcripts) will be sent to https://mega-api-prod.nemovideo.ai and processed off your machine; do not upload sensitive data unless you trust NemoVideo's policies. 2) Token storage — the skill may generate and persist an anonymous token locally; if you prefer, set NEMO_TOKEN yourself as an environment variable. 3) Source and trust — the skill’s homepage and repo are provided, but the registry owner is unknown; verify NemoVideo's terms/privacy if this matters. If any of these are unacceptable, do not install or only use with non-sensitive files.
功能分析
Type: OpenClaw Skill Name: audio-to-subtitle-generator Version: 1.0.0 The audio-to-subtitle-generator skill is a functional integration for the NemoVideo service, designed to automate speech-to-text transcription and subtitle generation. It manages its own authentication via the NEMO_TOKEN environment variable or an anonymous token flow (storing a client ID in ~/.config/nemovideo/), and communicates exclusively with the mega-api-prod.nemovideo.ai backend. The instructions in SKILL.md provide detailed logic for API orchestration, SSE stream handling, and error recovery, all of which are consistent with the stated purpose of processing media files for subtitles.
能力评估
Purpose & Capability
Name/description (audio → subtitles) align with the runtime instructions: the SKILL.md describes creating sessions, uploading video, requesting transcriptions, and exporting SRT/VTT via nemo's API. Declared primary credential (NEMO_TOKEN) and config path (~/.config/nemovideo/) are consistent with a cloud-backed transcription service.
Instruction Scope
Instructions are focused on the transcription workflow (token check/creation, create session, upload file, poll state, export). They instruct reading/writing a small config file (~/.config/nemovideo/client_id) and sending user files to nemo's API. This is expected, but it means user media and derived transcripts are sent off-box and a token may be persisted locally — a privacy consideration that users should be aware of.
Install Mechanism
No install spec and no code files — instruction-only skill. Nothing is downloaded or extracted to disk beyond the skill's suggested local config file, which reduces installation risk.
Credentials
Primary credential is NEMO_TOKEN which is appropriate for an API-backed transcription service. Minor metadata mismatch: 'requires.env' is empty while 'primaryEnv' is set to NEMO_TOKEN; SKILL.md handles the case by generating/storing an anonymous token if none is present. No unrelated credentials are requested.
Persistence & Privilege
always:false (no forced global enable). The skill writes/reads its own config (~/.config/nemovideo/) and may persist an anonymous token — reasonable for a client that maintains sessions. It does not request system-wide privileges or modify other skills.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install audio-to-subtitle-generator
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /audio-to-subtitle-generator 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of Audio to Subtitle Generator. - Instantly converts spoken audio from video files (mp4, mov, avi, webm, mkv) into accurate, time-synced subtitles (SRT, VTT). - Cloud-backed transcription with automatic setup and streamlined authentication (includes 100 free credits on first sign-in). - Simple file upload and subtitle export process with support for user requests and format preferences. - Real-time status updates and built-in error handling for common issues (authentication, file size, unsupported formats). - Designed for content creators, educators, accessibility, and fast video workflows.
元数据
Slug audio-to-subtitle-generator
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Audio To Subtitle Generator 是什么?

Tell me what you need and I'll turn your spoken audio into clean, time-synced subtitles in minutes. This audio-to-subtitle-generator skill transcribes dialog... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 140 次。

如何安装 Audio To Subtitle Generator?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install audio-to-subtitle-generator」即可一键安装,无需额外配置。

Audio To Subtitle Generator 是免费的吗?

是的,Audio To Subtitle Generator 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Audio To Subtitle Generator 支持哪些平台?

Audio To Subtitle Generator 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Audio To Subtitle Generator?

由 peandrover adam(@peand-rover)开发并维护,当前版本 v1.0.0。

💬 留言讨论