Wjs Burning Subtitles
/install wjs-burning-subtitles
wjs-burning-subtitles
Video + SRT → video with subtitles. Also the final-encode stage for the localization pipeline: takes a video, an optional dub track from /wjs-dubbing-video, and an optional SRT to burn, and produces the upload-ready MP4 in one ffmpeg pass. No cascade of decodes/re-encodes.
When to use
- User has an SRT and wants it always-visible on the video (burn-in for 微信视频号 / 抖音 / WeChat — players that won't honor embedded subtitle tracks).
- User wants a togglable subtitle track (soft-mux) for QuickTime / VLC / IINA / mobile players that support
mov_text. - Final composite after
/wjs-dubbing-video: burn target-language subs + mix dub over original-as-bed in one encode.
When NOT to use
- No SRT yet → run
/wjs-transcribing-audiothen/wjs-translating-subtitlesfirst. - HTML/CSS captions (kinetic, per-word highlights, custom fonts) on a clip composed in HyperFrames → use
/wjs-overlaying-videoinstead. Don't mix libass burn-in with HyperFrames captions on the same output. - The "subtitles" are actually motion graphics (animated callouts, lower-thirds with logos, kinetic typography) → that's
/wjs-overlaying-video, not this skill.
The 3 modes of render.py
scripts/render.py auto-detects mode from flags:
- Subtitles only —
--video + --srt→ re-encodes video with burned subs, original audio passes through. - Dub only —
--video + --dub→ keeps original video stream; replaces or mixes the audio track. - Full localized cut —
--video + --srt + --dub→ burns subs AND mixes dub. By default keeps original audio at low volume as a "bed" under the dub (set--bed-volume 0or--no-original-audioto drop it).
Burn-in requires an ffmpeg built with libass. The script auto-downloads a static libass-enabled build from evermeet.cx into /tmp/ff_bin/ on first use if needed.
Soft-mux (togglable subtitle track)
Player apps can show/hide. Works with any ffmpeg build — does not need libass:
ffmpeg -i input.mp4 -i input.zh-CN.srt \
-map 0:v -map 0:a -map 1:0 \
-c:v copy -c:a copy -c:s mov_text \
-metadata:s:s:0 language=zho -metadata:s:s:0 title="中文" \
output.mp4
This is fast (stream-copy) and reversible. Use it when:
- Target platform supports embedded subs (YouTube auto-detects; VLC/QuickTime honors).
- User wants viewers to be able to toggle off.
- You don't want to re-encode the video.
render.py --video IN.mp4 --srt SUB.srt --soft-mux runs this path.
Hardcoded burn-in (always visible, libass)
Required for WeChat/抖音/朋友圈 etc. where the player will not honor embedded subtitle tracks.
Verify libass is available BEFORE promising burn-in
ffmpeg -filters 2>&1 | grep -E "subtitles|^.. ass "
If neither subtitles nor ass shows up, the build lacks libass. Homebrew's default ffmpeg formula is often stripped (no --enable-libass, no --enable-libfreetype, no drawtext). Don't waste time fighting the comma-escaping inside force_style — it will fail with No such filter: 'subtitles' no matter how the shell quotes it.
Fastest fix on macOS — drop in a static build, no system changes
curl -fsSL -o /tmp/ff.zip https://evermeet.cx/ffmpeg/getrelease/zip
unzip -o /tmp/ff.zip -d /tmp/ff_bin >/dev/null
FF=/tmp/ff_bin/ffmpeg
$FF -version | grep -oE -- "--enable-(libass|libfreetype)"
Then use $FF instead of ffmpeg for the render. The brew binary is fine for everything else (probe, audio extraction, soft-mux). render.py does this auto-fallback if its default ffmpeg lacks libass.
Burn-in render with style overrides
🛑 Checkpoint — confirm before full-render. Burn-in re-encodes the entire video (minutes of CPU on a 5-min clip). Before kicking it off:
- Render only the first 30s with
-t 30for a fast preview. - Extract a frame from the longest-line cue (see Fontsize calibration below) and Read it.
- Show the user the preview frame + the cue text, ask: "字号/字体/边距 OK 吗?OK 才跑全片。" Wait for explicit confirmation.
Skip the checkpoint only if the user has already approved a full render of this exact video at this exact font config in the same conversation.
$FF -i input.mp4 \
-vf "subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'" \
-c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
-c:a copy output.mp4
Inside force_style, escape every comma as \, (the filter graph parser eats the bare comma as a chain separator). All other special chars are fine.
Fontsize calibration — critical
libass scales its internal PlayRes up to the actual video resolution. The number you pass is not pixels in the output. As a starting calibration on a 544×960 vertical phone video, Fontsize=22 rendered each Chinese character at ~55px wide and overflowed the frame, while Fontsize=12 rendered at ~30–35px wide and fit cleanly with 15-char lines.
Rule of thumb: start at Fontsize=12, render, then always extract a frame and look:
$FF -ss 30 -i output.mp4 -frames:v 1 /tmp/frame.png -y
# then Read /tmp/frame.png to verify the longest-line cue fits
Pick a timestamp that lands on the cue with the most characters per line — short lines won't expose overflow. Add MarginL=20 MarginR=20 as a safety inset; never trust default left/right margins.
Style cheatsheet
Keys that matter (libass force_style):
Fontname=PingFang SC— macOS default CJK; alternates:Songti SC,Heiti SC,STHeiti,Hiragino Sans GB.Fontsize=12— start small, scale up only after frame check.PrimaryColour=&H00FFFFFF— white text (BBGGRR + alpha).OutlineColour=&H00000000— black outline.BorderStyle=1— outline only (clean over varied backgrounds). UseBorderStyle=3for an opaque box behind text when the background is busy.Outline=2— 2px outline thickness.Shadow=1— subtle drop shadow.MarginL=20 MarginR=20— keep text inside the frame.MarginV=40— vertical distance from the bottom edge.
SRT line-length discipline for burn-in
Even with correct Fontsize, lines that are too long will wrap or overflow. Keep each on-screen line ≤ ~15 Chinese characters (~42 Latin chars). Use explicit \ line breaks inside the SRT block — do not rely on auto-wrapping. Two short lines beat one long one every time. (This is upstream discipline — /wjs-translating-subtitles should already cap cues at these limits.)
Audio mixing — keep the original as a low-volume bed
A pure dub-only track sounds dubbed (because it is). Mixing the original audio at low volume under the dub gives the "professional translation" feel — you still hear the speaker's breath, emphasis, and laughter, just under the new voice.
$FF -i original.mp4 -i dub.mp4 \
-filter_complex "[0:a]volume=0.18[orig];\
[1:a]volume=1.0[dub];\
[orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
-map 0:v -map "[a]" \
-c:v copy -c:a aac -b:a 192k mixed.mp4
Reasonable starting volumes:
- Original bed at
0.15–0.25(≈ −16 to −12 dB) - Dub at
1.0 - Use
normalize=0so amix doesn't auto-attenuate when both are active.
To drop the original entirely: --no-original-audio (equivalent to --bed-volume 0).
Combining dub + burn-in + bed (the full job)
One ffmpeg call does all three — burn the target subtitle onto the video stream and mix the two audio tracks:
$FF -i original.mp4 -i dub.mp4 \
-filter_complex "[0:v]subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'[v];\
[0:a]volume=0.18[orig];[1:a]volume=1.0[dub];\
[orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
-map "[v]" -map "[a]" \
-c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
-c:a aac -b:a 192k final.mp4
This is the "ship to social media" final cut. render.py --video original.mp4 --dub dub.mp4 --srt input.zh-CN.srt runs this exact pipeline.
Running render.py
# Subtitles only (burn):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
--video IN.mp4 --srt SUB.srt --out OUT.mp4
# Dub only (replace audio, no subs):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
--video IN.mp4 --dub IN_zh_dub.mp4 --out OUT.mp4
# Full localized cut (burn + dub + original bed):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
--video IN.mp4 --srt IN.zh-CN.srt --dub IN_zh_dub.mp4 --out OUT.mp4
# Soft-mux (no re-encode):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
--video IN.mp4 --srt SUB.srt --soft-mux --out OUT.mp4
See render.py --help for the full style/audio flag list (--font, --fontsize, --color, --outline-color, --margin-v, --bed-volume, --no-original-audio).
Output
- Burn mode:
\x3Csource>_burned.mp4(re-encoded, libass-rendered subs) - Soft-mux mode:
\x3Csource>_softsub.mp4(stream-copy,mov_texttrack) - Full cut:
\x3Csource>_final.mp4(re-encoded video with burned subs + mixed audio)
Anti-patterns
- ❌ Promising burn-in without verifying libass. Check
ffmpeg -filters | grep subtitlesfirst; auto-fall back to evermeet static build if missing. - ❌ Committing a burn render without a frame check. Always extract a frame at the longest-line cue and Read it before kicking off the full render.
- ❌ Bare commas inside
force_style. The filter graph parser eats them. Escape every internal comma as\,. - ❌ Mixing libass burn-in with HyperFrames captions. Pick ONE caption system per output video. If you're using HTML/CSS captions in
/wjs-overlaying-video, don't burn here too. - ❌ Using period milliseconds in the SRT. Whisper local writes
.mmm; libass tolerates it but other downstream tools choke. Normalize to,mmm. - ❌ Defaulting to
BorderStyle=3(opaque box). UseBorderStyle=1(outline only) unless the background is genuinely busy — the box looks heavy and dated.
Upstream
/wjs-transcribing-audio+/wjs-translating-subtitles— produce the SRT input./wjs-dubbing-video— produces the*_\x3Clang>_dub.mp4input for full-localized-cut mode. The dub-only file is technically a finished video; this skill is what mixes the original underneath and burns the subs to make it shippable.
Common pitfalls
- Fontsize that worked on one video looks tiny / huge on another. libass scales by PlayRes ratio, not pixels. Recalibrate per video resolution; don't trust a hardcoded value.
- Margin defaults clip text on vertical phone videos. Always set
MarginL=20 MarginR=20andMarginV=40(or higher) explicitly. mov_texttrack shows up in QuickTime but not in some Android players. If the target audience is mobile-Chinese, soft-mux is unreliable; burn instead.- Background-bus busy / contrast issues. Increase
Outline=2→Outline=3, or switch toBorderStyle=3for a translucent box (BackColour=&H80000000for 50% black).
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install wjs-burning-subtitles - After installation, invoke the skill by name or use
/wjs-burning-subtitles - Provide required inputs per the skill's parameter spec and get structured output
What is Wjs Burning Subtitles?
Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Als... It is an AI Agent Skill for Claude Code / OpenClaw, with 66 downloads so far.
How do I install Wjs Burning Subtitles?
Run "/install wjs-burning-subtitles" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Wjs Burning Subtitles free?
Yes, Wjs Burning Subtitles is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Wjs Burning Subtitles support?
Wjs Burning Subtitles is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Wjs Burning Subtitles?
It is built and maintained by Jian Shuo Wang (@jianshuo); the current version is v0.1.0.