功能描述

Create Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie × Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions.

使用说明 (SKILL.md)

AI Podcast Pipeline

Name: Ai Podcast Pipeline
Author: jeong-wooseok

⚠️ Security Notice

This skill may trigger antivirus false positives due to legitimate use of:

base64 decoding: Used ONLY to decode audio data from Gemini TTS API responses (standard practice for binary data in JSON)
subprocess calls: Used ONLY to invoke ffmpeg for audio/video processing
Environment variables: Reads API keys from user-configured environment (GEMINI_API_KEY)
Network requests: Calls Google Gemini API for text-to-speech generation

All code is open source and auditable in this repository. No malicious behavior.

Build end-to-end podcast assets from Trend/QuickView-* content.

Core Workflow

Select source QuickView file.
Generate script (full or compressed mode).
Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability).
Generate full-text Korean subtitles (no ellipsis truncation).
Render subtitle MP4 with tuned font/size/timing shift.
Build thumbnail + YouTube metadata.
Deliver final package.

Step 1) Select Source

Prefer weekly QuickView file from your configured Quartz root.

If user gives wk.aiee.app URL, map to local Quartz markdown first.

Step 2) Generate Script

Read and apply:

references/podcast_prompt_template_ko.md

Modes:

Full mode: 15~20 minutes
Compressed mode: 5~7 minutes (core tips only)

Rules:

no system/meta text in spoken lines
host intro once at opening only
conversational Korean, short sentences, actionable
save script in archive/

Step 3) Build Audio (Gemini Multi-Speaker, Reliable)

Preferred: chunked builder (timeout-safe)

# Set API key via environment (required)
export GEMINI_API_KEY="\x3CYOUR_KEY>"

# Run from skills/ai-podcast-pipeline/
python3 scripts/build_dualvoice_audio.py \
  --input \x3Cscript.txt> \
  --outdir \x3Coutdir> \
  --basename podcast_full_dualvoice \
  --chunk-lines 6

Single-pass (short scripts)

python3 scripts/gemini_multispeaker_tts.py \
  --input-file \x3Cdialogue.txt> \
  --outdir \x3Coutdir> \
  --basename podcast_dualvoice \
  --retries 3 \
  --timeout-seconds 120

Default voice mapping (2026-02-10 fixed):

Callie (female) → Kore
Nick (male) → Puck

Output: MP3 (default delivery format)

Step 4) Build Korean Subtitles (Full Text)

Use full-text subtitle builder (no ... truncation):

python3 scripts/build_korean_srt.py \
  --script \x3Cscript.txt> \
  --audio \x3Cfinal.mp3> \
  --output \x3Coutdir>/podcast.srt \
  --max-chars 22

Step 5) Render Subtitled MP4 (Font + Timing)

Use renderer with adjustable font and timing shift:

python3 scripts/render_subtitled_video.py \
  --image \x3Cthumbnail.png> \
  --audio \x3Cfinal.mp3> \
  --srt \x3Cpodcast.srt> \
  --output \x3Coutdir>/final.mp4 \
  --font-name "Do Hyeon" \
  --font-size 27 \
  --shift-ms -250

Notes:

shift-ms negative = subtitle earlier (for lag fixes)
If text clipping occurs, lower font-size (e.g., 25~27)
keep text inside safe area; avoid overlap with character/object

Step 6) Build Thumbnail + YouTube Metadata

# Set API key via environment (required)
export GEMINI_API_KEY="\x3CYOUR_KEY>"

python3 scripts/build_podcast_assets.py \
  --source "\x3CQuickView path or URL>"

Reference (layout/copy guardrails):

references/thumbnail_guidelines_ko.md

Step 7) Final Delivery Checklist

Always include:

source used
final MP3 path
subtitle MP4 path + size
thumbnail path
YouTube title options (3)
YouTube description

Reliability Rules

Gemini timeout on long input: use chunked builder (build_dualvoice_audio.py)
Subtitle clipping: reduce font size and increase bottom margin
Subtitle lag: adjust --shift-ms (usually -150 to -300)
Keep generated assets under Telegram practical limits

Security Notes

API keys must be passed via environment variables (GEMINI_API_KEY), not hardcoded.
Never paste raw keys into prompts, logs, screenshots, or public posts.
Recent hardening: thumbnail generation now passes keys via env (not CLI args).

References

references/podcast_prompt_template_ko.md
references/workflow_runbook.md
references/thumbnail_guidelines_ko.md

安全使用建议

Before installing, be aware of these issues and take steps to reduce risk: 1) Credentials: the scripts require GEMINI_API_KEY (or NANO_BANANA_KEY) even though the skill metadata lists no env vars — provide an API key via environment variables only (as the SKILL.md advises) and do not paste keys into prompts or logs. Understand that the same key is used for both TTS and thumbnail generation. 2) Missing declared dependencies: the package invokes external binaries (ffmpeg, ffprobe), Python libraries (Pillow), and a 'uv' runner to call a separate nano-banana-pro script; those binaries and the font assets are not declared in metadata. Ensure you have ffmpeg/ffprobe installed, Python and packages available, and verify what 'uv run' refers to on your system before running. 3) Local-path assumptions: the code expects a QUARTZ_ROOT path and attempts to map HTTPS QuickView URLs to local markdown under that root; confirm that the default path is appropriate for your environment or set QUARTZ_ROOT to a safe directory. The build_podcast_assets script also expects WORKSPACE_DIR/skills/nano-banana-pro and youtube-editor fonts to exist — verify those paths or pass --no-image if you want to skip image generation. 4) Cross-skill invocation: the thumbnail step calls another skill/script (nano-banana-pro). Verify you trust that other code before letting this skill run it. 5) Audit recommended: because metadata is incomplete and the skill will send data to Google Gemini (network requests), review the included scripts yourself (or run in an isolated/test environment) to confirm they meet your privacy/security requirements. If you plan to use a real Gemini API key, consider quota/cost and rotate keys if you suspect misuse. Taken together these mismatches (undisclosed env vars and undeclared binary/config dependencies, plus cross-skill execution) make the package suspicious rather than plainly benign; resolve or validate these points before use.

功能分析

Type: OpenClaw Skill Name: ai-podcast-pipeline Version: 0.1.5 The skill bundle is designed for creating AI podcast packages, utilizing Google Gemini for TTS and ffmpeg for audio/video processing. All code aligns with the stated purpose, and explicit security measures are in place, such as reading API keys from environment variables (`GEMINI_API_KEY`) and blocking insecure HTTP URLs in `scripts/build_podcast_assets.py`. The use of `subprocess.run` is extensive but appears to handle inputs carefully, with `ff_escape` for ffmpeg filter strings and internal path construction. There is no evidence of data exfiltration, persistence mechanisms, or malicious prompt injection attempts in `SKILL.md` or other documentation. The security notices and hardening efforts indicate a focus on transparency and secure practices.

能力评估

⚠ Purpose & Capability

The code and SKILL.md implement Korean dual-voice TTS, subtitle generation, thumbnail composition, and packaging — consistent with the stated purpose. However the registry metadata declares no required environment variables, no config paths, and no required binaries, while the code clearly expects a GEMINI_API_KEY (or NANO_BANANA_KEY), a QUARTZ_ROOT local path, and other workspace files (e.g., skills/nano-banana-pro/scripts/generate_image.py, youtube-editor fonts). This missing declaration is an incoherence (the skill will require keys, fonts, and external scripts to function).

ℹ Instruction Scope

Runtime instructions are focused on the stated workflow and reference the included scripts. They instruct reading local QuickView markdown, using GEMINI TTS, invoking ffmpeg/ffprobe, and calling a separate 'nano-banana-pro' image generator via 'uv run'. These actions are within the skill's purpose, but the instructions assume access to local workspace paths and another skill's script (cross-skill invocation), and they instruct supplying API keys via environment variables even though the registry metadata didn't list them.

✓ Install Mechanism

There is no install spec (instruction-only with included scripts). No remote downloads or archive extraction are present in the manifest, which keeps install risk low. The scripts will be written to disk as part of the skill bundle, but nothing is fetched from unknown URLs during install.

⚠ Credentials

The code legitimately needs one API credential (GEMINI_API_KEY or NANO_BANANA_KEY) for Gemini TTS/image generation and it reads environment variables. That is proportionate to the stated functionality, but the metadata declares no required env vars. The skill also implicitly requires local configuration (QUARTZ_ROOT) and access to other workspace skill scripts and font assets — these config path requirements were not declared. The absence of declared credentials/configs is misleading and could cause accidental key exposure if users are not alerted.

✓ Persistence & Privilege

The skill does not request always: true and does not modify other skills' configurations. It writes outputs to workspace/media directories and calls other scripts, but it does not request privileged/system-wide persistence. Autonomous invocation is enabled by default (normal for skills) and is not in itself a disqualifier.

版本历史

v0.1.5

Security: Added notice for VirusTotal false positives

v0.1.4

Security: Remove .env refs, block http://, use relative paths only

v0.1.1

Security hardening: API key via env (not CLI), subprocess timeouts, no hardcoded paths, safe file-size display, removed sensitive examples from docs.

v0.1.0

Initial release: Korean dual-host podcast package pipeline (script, Gemini multi-speaker TTS, subtitle MP4, thumbnail, YouTube metadata).

元数据

Slug ai-podcast-pipeline

版本 0.1.5

许可证 —

累计安装 1

当前安装数 1

历史版本数 4

常见问题