功能描述

Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Als...

使用说明 (SKILL.md)

wjs-burning-subtitles

Name: Wjs Burning Subtitles
Author: jianshuo

Video + SRT → video with subtitles. Also the final-encode stage for the localization pipeline: takes a video, an optional dub track from /wjs-dubbing-video, and an optional SRT to burn, and produces the upload-ready MP4 in one ffmpeg pass. No cascade of decodes/re-encodes.

When to use

User has an SRT and wants it always-visible on the video (burn-in for 微信视频号 / 抖音 / WeChat — players that won't honor embedded subtitle tracks).
User wants a togglable subtitle track (soft-mux) for QuickTime / VLC / IINA / mobile players that support mov_text.
Final composite after /wjs-dubbing-video: burn target-language subs + mix dub over original-as-bed in one encode.

When NOT to use

No SRT yet → run /wjs-transcribing-audio then /wjs-translating-subtitles first.
HTML/CSS captions (kinetic, per-word highlights, custom fonts) on a clip composed in HyperFrames → use /wjs-overlaying-video instead. Don't mix libass burn-in with HyperFrames captions on the same output.
The "subtitles" are actually motion graphics (animated callouts, lower-thirds with logos, kinetic typography) → that's /wjs-overlaying-video, not this skill.

The 3 modes of `render.py`

scripts/render.py auto-detects mode from flags:

Subtitles only — --video + --srt → re-encodes video with burned subs, original audio passes through.
Dub only — --video + --dub → keeps original video stream; replaces or mixes the audio track.
Full localized cut — --video + --srt + --dub → burns subs AND mixes dub. By default keeps original audio at low volume as a "bed" under the dub (set --bed-volume 0 or --no-original-audio to drop it).

Burn-in requires an ffmpeg built with libass. The script auto-downloads a static libass-enabled build from evermeet.cx into /tmp/ff_bin/ on first use if needed.

Soft-mux (togglable subtitle track)

Player apps can show/hide. Works with any ffmpeg build — does not need libass:

ffmpeg -i input.mp4 -i input.zh-CN.srt \
  -map 0:v -map 0:a -map 1:0 \
  -c:v copy -c:a copy -c:s mov_text \
  -metadata:s:s:0 language=zho -metadata:s:s:0 title="中文" \
  output.mp4

This is fast (stream-copy) and reversible. Use it when:

Target platform supports embedded subs (YouTube auto-detects; VLC/QuickTime honors).
User wants viewers to be able to toggle off.
You don't want to re-encode the video.

render.py --video IN.mp4 --srt SUB.srt --soft-mux runs this path.

Hardcoded burn-in (always visible, libass)

Required for WeChat/抖音/朋友圈 etc. where the player will not honor embedded subtitle tracks.

Verify libass is available BEFORE promising burn-in

ffmpeg -filters 2>&1 | grep -E "subtitles|^.. ass "

If neither subtitles nor ass shows up, the build lacks libass. Homebrew's default ffmpeg formula is often stripped (no --enable-libass, no --enable-libfreetype, no drawtext). Don't waste time fighting the comma-escaping inside force_style — it will fail with No such filter: 'subtitles' no matter how the shell quotes it.

Fastest fix on macOS — drop in a static build, no system changes

curl -fsSL -o /tmp/ff.zip https://evermeet.cx/ffmpeg/getrelease/zip
unzip -o /tmp/ff.zip -d /tmp/ff_bin >/dev/null
FF=/tmp/ff_bin/ffmpeg
$FF -version | grep -oE -- "--enable-(libass|libfreetype)"

Then use $FF instead of ffmpeg for the render. The brew binary is fine for everything else (probe, audio extraction, soft-mux). render.py does this auto-fallback if its default ffmpeg lacks libass.

Burn-in render with style overrides

🛑 Checkpoint — confirm before full-render. Burn-in re-encodes the entire video (minutes of CPU on a 5-min clip). Before kicking it off:

Render only the first 30s with -t 30 for a fast preview.
Extract a frame from the longest-line cue (see Fontsize calibration below) and Read it.
Show the user the preview frame + the cue text, ask: "字号/字体/边距 OK 吗？OK 才跑全片。" Wait for explicit confirmation.

Skip the checkpoint only if the user has already approved a full render of this exact video at this exact font config in the same conversation.

$FF -i input.mp4 \
  -vf "subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'" \
  -c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
  -c:a copy output.mp4

Inside force_style, escape every comma as \, (the filter graph parser eats the bare comma as a chain separator). All other special chars are fine.

Fontsize calibration — critical

libass scales its internal PlayRes up to the actual video resolution. The number you pass is not pixels in the output. As a starting calibration on a 544×960 vertical phone video, Fontsize=22 rendered each Chinese character at ~55px wide and overflowed the frame, while Fontsize=12 rendered at ~30–35px wide and fit cleanly with 15-char lines.

Rule of thumb: start at Fontsize=12, render, then always extract a frame and look:

$FF -ss 30 -i output.mp4 -frames:v 1 /tmp/frame.png -y
# then Read /tmp/frame.png to verify the longest-line cue fits

Pick a timestamp that lands on the cue with the most characters per line — short lines won't expose overflow. Add MarginL=20 MarginR=20 as a safety inset; never trust default left/right margins.

Style cheatsheet

Keys that matter (libass force_style):

Fontname=PingFang SC — macOS default CJK; alternates: Songti SC, Heiti SC, STHeiti, Hiragino Sans GB.
Fontsize=12 — start small, scale up only after frame check.
PrimaryColour=&H00FFFFFF — white text (BBGGRR + alpha).
OutlineColour=&H00000000 — black outline.
BorderStyle=1 — outline only (clean over varied backgrounds). Use BorderStyle=3 for an opaque box behind text when the background is busy.
Outline=2 — 2px outline thickness.
Shadow=1 — subtle drop shadow.
MarginL=20 MarginR=20 — keep text inside the frame.
MarginV=40 — vertical distance from the bottom edge.

SRT line-length discipline for burn-in

Even with correct Fontsize, lines that are too long will wrap or overflow. Keep each on-screen line ≤ ~15 Chinese characters (~42 Latin chars). Use explicit \ line breaks inside the SRT block — do not rely on auto-wrapping. Two short lines beat one long one every time. (This is upstream discipline — /wjs-translating-subtitles should already cap cues at these limits.)

Audio mixing — keep the original as a low-volume bed

A pure dub-only track sounds dubbed (because it is). Mixing the original audio at low volume under the dub gives the "professional translation" feel — you still hear the speaker's breath, emphasis, and laughter, just under the new voice.

$FF -i original.mp4 -i dub.mp4 \
  -filter_complex "[0:a]volume=0.18[orig];\
                   [1:a]volume=1.0[dub];\
                   [orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
  -map 0:v -map "[a]" \
  -c:v copy -c:a aac -b:a 192k mixed.mp4

Reasonable starting volumes:

Original bed at 0.15–0.25 (≈ −16 to −12 dB)
Dub at 1.0
Use normalize=0 so amix doesn't auto-attenuate when both are active.

To drop the original entirely: --no-original-audio (equivalent to --bed-volume 0).

Combining dub + burn-in + bed (the full job)

One ffmpeg call does all three — burn the target subtitle onto the video stream and mix the two audio tracks:

$FF -i original.mp4 -i dub.mp4 \
  -filter_complex "[0:v]subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'[v];\
                   [0:a]volume=0.18[orig];[1:a]volume=1.0[dub];\
                   [orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
  -map "[v]" -map "[a]" \
  -c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
  -c:a aac -b:a 192k final.mp4

This is the "ship to social media" final cut. render.py --video original.mp4 --dub dub.mp4 --srt input.zh-CN.srt runs this exact pipeline.

Running `render.py`

# Subtitles only (burn):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --srt SUB.srt --out OUT.mp4

# Dub only (replace audio, no subs):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --dub IN_zh_dub.mp4 --out OUT.mp4

# Full localized cut (burn + dub + original bed):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --srt IN.zh-CN.srt --dub IN_zh_dub.mp4 --out OUT.mp4

# Soft-mux (no re-encode):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --srt SUB.srt --soft-mux --out OUT.mp4

See render.py --help for the full style/audio flag list (--font, --fontsize, --color, --outline-color, --margin-v, --bed-volume, --no-original-audio).

Output

Burn mode: \x3Csource>_burned.mp4 (re-encoded, libass-rendered subs)
Soft-mux mode: \x3Csource>_softsub.mp4 (stream-copy, mov_text track)
Full cut: \x3Csource>_final.mp4 (re-encoded video with burned subs + mixed audio)

Anti-patterns

❌ Promising burn-in without verifying libass. Check ffmpeg -filters | grep subtitles first; auto-fall back to evermeet static build if missing.
❌ Committing a burn render without a frame check. Always extract a frame at the longest-line cue and Read it before kicking off the full render.
❌ Bare commas inside force_style. The filter graph parser eats them. Escape every internal comma as \,.
❌ Mixing libass burn-in with HyperFrames captions. Pick ONE caption system per output video. If you're using HTML/CSS captions in /wjs-overlaying-video, don't burn here too.
❌ Using period milliseconds in the SRT. Whisper local writes .mmm; libass tolerates it but other downstream tools choke. Normalize to ,mmm.
❌ Defaulting to BorderStyle=3 (opaque box). Use BorderStyle=1 (outline only) unless the background is genuinely busy — the box looks heavy and dated.

Upstream

/wjs-transcribing-audio + /wjs-translating-subtitles — produce the SRT input.
/wjs-dubbing-video — produces the *_\x3Clang>_dub.mp4 input for full-localized-cut mode. The dub-only file is technically a finished video; this skill is what mixes the original underneath and burns the subs to make it shippable.

Common pitfalls

Fontsize that worked on one video looks tiny / huge on another. libass scales by PlayRes ratio, not pixels. Recalibrate per video resolution; don't trust a hardcoded value.
Margin defaults clip text on vertical phone videos. Always set MarginL=20 MarginR=20 and MarginV=40 (or higher) explicitly.
mov_text track shows up in QuickTime but not in some Android players. If the target audience is mobile-Chinese, soft-mux is unreliable; burn instead.
Background-bus busy / contrast issues. Increase Outline=2 → Outline=3, or switch to BorderStyle=3 for a translucent box (BackColour=&H80000000 for 50% black).

安全使用建议

Review before installing. The subtitle-rendering behavior is legitimate, but for safer use, install a trusted libass-enabled ffmpeg yourself and avoid letting the skill auto-download one. If you do use it, clear or inspect /tmp/ff_bin/ffmpeg first and choose a non-important output filename.

功能分析

Type: OpenClaw Skill Name: wjs-burning-subtitles Version: 0.1.0 The skill includes a Python script (scripts/render.py) and instructions (SKILL.md) that automatically download and execute a static FFmpeg binary from an external third-party URL (evermeet.cx) if the local environment lacks libass support. While this behavior is documented and serves the stated purpose of burning subtitles, downloading and running unverified binaries from the internet is a high-risk pattern. No evidence of intentional malice, data exfiltration, or persistence was found.

能力评估

ℹ Purpose & Capability

The core behavior is coherent with the stated purpose: it uses ffmpeg to burn subtitles, soft-mux subtitles, and mix dub/original audio.

ℹ Instruction Scope

The SKILL.md discloses the ffmpeg/libass requirement and recommends a preview checkpoint before full renders, but the runtime fallback download happens automatically once the script decides libass is needed.

⚠ Install Mechanism

There is no install spec or declared required binary, yet the included script downloads a ZIP from evermeet.cx at runtime, extracts it, chmods the ffmpeg binary, and later executes it without checksum or signature verification.

⚠ Credentials

Running ffmpeg is proportionate for this skill, but automatically executing an internet-downloaded binary is broader environmental authority than ordinary video rendering.

⚠ Persistence & Privilege

The script caches and trusts /tmp/ff_bin/ffmpeg if it already exists, which can persist across runs and is not validated before execution.

版本历史

v0.1.0

Initial release

元数据

Slug wjs-burning-subtitles

版本 0.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题