← 返回 Skills 市场
gizmogremlin

Dub YouTube with Voice.ai

作者 Nick Gill · GitHub ↗ · v0.1.6
cross-platform ⚠ suspicious
1404
总下载
2
收藏
0
当前安装
7
版本数
在 OpenClaw 中安装
/install dub-youtube-with-voiceai
功能描述
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
使用说明 (SKILL.md)

Dub YouTube with Voice.ai

This skill follows the Agent Skills specification.

Turn any script into a YouTube-ready voiceover — complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command.

Built for YouTube creators who want studio-quality narration without the studio. Powered by Voice.ai.


When to use this skill

Scenario Why it fits
YouTube long-form Full narration with chapter markers and captions
YouTube Shorts Quick hooks with punchy delivery
Course content Professional narration for educational videos
Screen recordings Dub a screencast with clean AI voiceover
Quick iteration Smart caching — edit one section, only that segment re-renders
Batch production Same voice, consistent quality across every video

The one-command workflow

Have a script and a video? Dub it in one shot:

node voiceai-vo.cjs build \
  --input my-script.md \
  --voice oliver \
  --title "My YouTube Video" \
  --video ./my-recording.mp4 \
  --mux \
  --template youtube

This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:

  • out/my-youtube-video/muxed.mp4 — your video dubbed with the AI voiceover
  • out/my-youtube-video/master.wav — the standalone audio
  • out/my-youtube-video/review.html — listen and review each segment
  • out/my-youtube-video/chapters.txt — paste directly into your YouTube description
  • out/my-youtube-video/captions.srt — upload to YouTube as subtitles
  • out/my-youtube-video/description.txt — ready-made YouTube description with chapters

Use --sync pad if the audio is shorter than the video, or --sync trim to cut it to match.


Requirements

  • Node.js 20+ — runtime (no npm install needed — the CLI is a single bundled file)
  • VOICE_AI_API_KEY — set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard.
  • ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video dubbing. The pipeline still produces individual segments, the review page, chapters, and captions without it.

Configuration

Set VOICE_AI_API_KEY as an environment variable before running:

export VOICE_AI_API_KEY=your-key-here

The skill does not read .env files or access any files for credentials — only the environment variable.

Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).


Commands

build — Generate a YouTube voiceover from a script

node voiceai-vo.cjs build \
  --input \x3Cscript.md or script.txt> \
  --voice \x3Cvoice-alias-or-uuid> \
  --title "My YouTube Video" \
  [--template youtube] \
  [--video input.mp4 --mux --sync shortest] \
  [--force] [--mock]

What it does:

  1. Reads the script and splits it into segments (by ## headings for .md, or by sentence boundaries for .txt)
  2. Optionally prepends/appends YouTube intro/outro segments
  3. Renders each segment via Voice.ai TTS
  4. Stitches a master audio file (if ffmpeg is available)
  5. Generates YouTube chapters, SRT captions, a review page, and a ready-made description
  6. Optionally dubs your video with the voiceover

Full options:

Option Description
-i, --input \x3Cpath> Script file (.txt or .md) — required
-v, --voice \x3Cid> Voice alias or UUID — required
-t, --title \x3Ctitle> Video title (defaults to filename)
--template youtube Auto-inject YouTube intro/outro
--mode \x3Cmode> headings or auto (default: headings for .md)
--max-chars \x3Cn> Max characters per auto-chunk (default: 1500)
--language \x3Ccode> Language code (default: en)
--video \x3Cpath> Input video to dub
--mux Enable video dubbing (requires --video)
--sync \x3Cpolicy> shortest, pad, or trim (default: shortest)
--force Re-render all segments (ignore cache)
--mock Mock mode — no API calls, placeholder audio
-o, --out \x3Cdir> Custom output directory

replace-audio — Dub an existing video

node voiceai-vo.cjs replace-audio \
  --video ./my-video.mp4 \
  --audio ./out/my-video/master.wav \
  [--out ./out/my-video/dubbed.mp4] \
  [--sync shortest|pad|trim]

Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.

Sync policy Behavior
shortest (default) Output ends when the shorter track ends
pad Pad audio with silence to match video duration
trim Trim audio to match video duration

Video stream is copied without re-encoding (-c:v copy). Audio is encoded as AAC for YouTube compatibility.

Privacy: Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.

voices — List available voices

node voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock]

Available voices

Use short aliases or full UUIDs with --voice:

Alias Voice Gender Best for YouTube
ellie Ellie F Vlogs, lifestyle, social content
oliver Oliver M Tutorials, narration, explainers
lilith Lilith F ASMR, calm walkthroughs
smooth Smooth Calm Voice M Documentaries, long-form essays
corpse Corpse Husband M Gaming, entertainment
skadi Skadi F Anime, character content
zhongli Zhongli M Gaming, dramatic intros
flora Flora F Kids content, upbeat videos
chief Master Chief M Gaming, action trailers

The voices command also returns any additional voices available on the API. Voice list is cached for 10 minutes.


Build outputs

After a build, the output directory contains everything you need to publish on YouTube:

out/\x3Ctitle-slug>/
  segments/           # Numbered WAV files (001-intro.wav, 002-section.wav, …)
  master.wav          # Stitched voiceover (requires ffmpeg)
  master.mp3          # MP3 for upload (requires ffmpeg)
  muxed.mp4           # Dubbed video (if --video --mux used)
  chapters.txt        # Paste into YouTube description
  captions.srt        # Upload as YouTube subtitles
  description.txt     # Ready-made YouTube description with chapters
  review.html         # Interactive review page with audio players
  manifest.json       # Build metadata: voice, template, segment list
  timeline.json       # Segment durations and start times

YouTube workflow

  1. Run the build command
  2. Upload muxed.mp4 (or your original video + master.mp3 as audio)
  3. Paste chapters.txt content into your YouTube description
  4. Upload captions.srt as subtitles in YouTube Studio
  5. Done — professional narration, chapters, and captions in minutes

YouTube template

Use --template youtube to auto-inject a branded intro and outro:

Segment Source file
Intro (prepended) templates/youtube_intro.txt
Outro (appended) templates/youtube_outro.txt

Edit the files in templates/ to customize your channel's branding.


Caching

Segments are cached by a hash of: text content + voice ID + language.

  • Unchanged segments are skipped on rebuild — fast iteration
  • Modified segments are re-rendered automatically
  • Use --force to re-render everything
  • Cache manifest is stored in segments/.cache.json

Multilingual dubbing

Voice.ai supports 11 languages — dub your YouTube videos for global audiences:

en, es, fr, de, it, pt, pl, ru, nl, sv, ca

node voiceai-vo.cjs build \
  --input script-spanish.md \
  --voice ellie \
  --title "Mi Video" \
  --language es \
  --video ./my-video.mp4 \
  --mux

The pipeline auto-selects the multilingual TTS model for non-English languages.


Troubleshooting

Issue Solution
ffmpeg missing Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing.
Rate limits (429) Segments render sequentially, which stays under most limits. Wait and retry.
Insufficient credits (402) Top up at voice.ai/dashboard. Cached segments won't re-use credits on retry.
Long scripts Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls.
Windows paths Wrap paths with spaces in quotes: --input "C:\My Scripts\script.md"

See references/TROUBLESHOOTING.md for more.


References

安全使用建议
This skill largely matches its stated purpose (turning scripts into TTS and optionally muxing into video). However: 1) the registry metadata claims no required env vars while the SKILL.md, YAML and reference docs require VOICE_AI_API_KEY — that's an inconsistency you should resolve. 2) The docs and YAML point to a non-production base URL (https://dev.voice.ai) and troubleshooting notes that production endpoints may be placeholders — prefer running in --mock mode first. 3) The README/SKILL.md contain contradictory statements about .env handling; the safest assumption is the script reads only the VOICE_AI_API_KEY environment variable. 4) Because the implementation is a bundled Node script (voiceai-vo.cjs), inspect the top of that file for hard-coded endpoints or unexpected remote hosts before using a real API key. Recommended steps before installing or running with real credentials: - Test with --mock to produce outputs locally without network calls. - Review voiceai-vo.cjs for any network calls and confirm they go to the expected Voice.ai API host. - Run the tool in an isolated environment (container or VM) and/or monitor outbound network requests (or set VOICEAI_API_BASE to a controlled proxy) to ensure no unexpected exfiltration. - Prefer obtaining a key from an official Voice.ai dashboard and do not use unrelated elevated credentials. If you cannot verify the endpoints and credential usage in the bundled script, treat this package with caution.
功能分析
Type: OpenClaw Skill Name: dub-youtube-with-voiceai Version: 0.1.6 The skill is classified as suspicious due to the presence of the `VOICEAI_API_BASE` environment variable override in `voiceai-vo.cjs` (src/api.ts) and documented in `references/VOICEAI_API.md`. While a legitimate configuration option, this allows an external actor (e.g., via prompt injection against an AI agent) to redirect API calls for text-to-speech generation to an arbitrary, potentially malicious, endpoint. This creates a clear data exfiltration vector for the user's script content, which is sent to the TTS API. Additionally, the skill executes external binaries like `ffmpeg` via `child_process.execFile`, which, while generally safer than `exec`, still represents a risky capability if not handled with extreme care or if the external binaries themselves have vulnerabilities.
能力评估
Purpose & Capability
The code and documentation implement a Node CLI that renders text to audio and (optionally) muxes it into video using ffmpeg — this is coherent with the stated purpose. However the registry metadata lists no required environment variable while SKILL.md and other files declare VOICE_AI_API_KEY as required; the skill's YAML and reference docs point to a 'dev.voice.ai' base URL (a non-production host). These mismatches reduce confidence in the packaging/authoring.
Instruction Scope
SKILL.md states only script text is sent to Voice.ai and that video files are processed locally — which is appropriate for a dubbing tool. But the documentation contains contradictory statements about credentials (.env support vs. 'does not read .env files') and troubleshooting warns that 'production API endpoints' are not configured and to use --mock mode. The runtime instructions grant the CLI discretion to call API endpoints (and allow overriding base URL via VOICEAI_API_BASE), so confirm endpoints and credential usage before running.
Install Mechanism
No install spec; the skill is a single bundled Node script (voiceai-vo.cjs) and optionally uses ffmpeg on PATH. There are no downloads or archive extracts in the manifest, so install risk is low. The bundled script is large (bundled deps), so review is required but the packaging itself is not suspicious.
Credentials
Requesting a single VOICE_AI_API_KEY is proportionate for a TTS integration. However the registry metadata omitted this required env var (incoherent), and the code/docs expose an overridable base URL via VOICEAI_API_BASE. Those inconsistencies mean the declared required-env in the registry cannot be fully trusted without inspection of the bundled script.
Persistence & Privilege
The skill does not request permanent presence (always:false) and does not declare modifications to other skills or system-wide settings. It runs as a CLI tool; there is no indication it persists credentials beyond using the provided environment variable.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install dub-youtube-with-voiceai
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /dub-youtube-with-voiceai 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.6
- Added skill manifest file: dub-youtube-with-voiceai.yaml to support discovery and integration. - No changes to core functionality or documentation.
v0.1.5
- Changes environment variable handling: VOICE_AI_API_KEY must be set as an environment variable; .env files are no longer supported. - Updated documentation in SKILL.md and README.md to reflect this simplified credential setup. - Removed all references to reading credentials from files for improved clarity and security.
v0.1.4
- Changed environment variable lookup logic: now only VOICE_AI_API_KEY is supported and loaded from the environment or .env file; alternate names are no longer checked. - Updated documentation to clarify configuration and explicitly state that only VOICE_AI_API_KEY in .env will be read. - No user-facing CLI/API changes.
v0.1.3
No visible file or documentation changes in this release. - No file changes detected between versions 0.1.1 and 0.1.3. - SKILL.md and feature set remain unchanged from the previous version.
v0.1.2
No user-facing changes in this release. Version number updated only.
v0.1.1
- Major refactor: all source code and config files removed, leaving only documentation files. - README.md and SKILL.md updated; project is now documentation-only (no code or CLI interface included). - No functional skill code remains; installation and usage instructions are no longer applicable. - Previous build and command details are preserved in the documentation for reference.
v0.1.0
Initial release of dub-youtube-with-voiceai — dub YouTube videos with Voice.ai TTS. - Turn scripts into YouTube-ready voiceovers with chapters, captions, and master audio. - One-command workflow to dub videos with AI-narrated audio. - Outputs ready-made files for YouTube: dubbed video, audio, chapters, captions, and description. - Smart caching—edit a section and only that segment re-renders. - Optional ffmpeg support for advanced audio/video processing. - Privacy focused: video is processed locally, only script text sent to Voice.ai. - CLI includes voice listing, video dubbing, and asset generation commands.
元数据
Slug dub-youtube-with-voiceai
版本 0.1.6
许可证
累计安装 0
当前安装数 0
历史版本数 7
常见问题

Dub YouTube with Voice.ai 是什么?

Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1404 次。

如何安装 Dub YouTube with Voice.ai?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dub-youtube-with-voiceai」即可一键安装,无需额外配置。

Dub YouTube with Voice.ai 是免费的吗?

是的,Dub YouTube with Voice.ai 完全免费(开源免费),可自由下载、安装和使用。

Dub YouTube with Voice.ai 支持哪些平台?

Dub YouTube with Voice.ai 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Dub YouTube with Voice.ai?

由 Nick Gill(@gizmogremlin)开发并维护,当前版本 v0.1.6。

💬 留言讨论