← 返回 Skills 市场
jeong-wooseok

Ai Podcast Pipeline

作者 jeong-wooseok · GitHub ↗ · v0.1.5
cross-platform ⚠ suspicious
1382
总下载
0
收藏
1
当前安装
4
版本数
在 OpenClaw 中安装
/install ai-podcast-pipeline
功能描述
Create Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie × Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions.
使用说明 (SKILL.md)

AI Podcast Pipeline

⚠️ Security Notice

This skill may trigger antivirus false positives due to legitimate use of:

  • base64 decoding: Used ONLY to decode audio data from Gemini TTS API responses (standard practice for binary data in JSON)
  • subprocess calls: Used ONLY to invoke ffmpeg for audio/video processing
  • Environment variables: Reads API keys from user-configured environment (GEMINI_API_KEY)
  • Network requests: Calls Google Gemini API for text-to-speech generation

All code is open source and auditable in this repository. No malicious behavior.

Build end-to-end podcast assets from Trend/QuickView-* content.

Core Workflow

  1. Select source QuickView file.
  2. Generate script (full or compressed mode).
  3. Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability).
  4. Generate full-text Korean subtitles (no ellipsis truncation).
  5. Render subtitle MP4 with tuned font/size/timing shift.
  6. Build thumbnail + YouTube metadata.
  7. Deliver final package.

Step 1) Select Source

Prefer weekly QuickView file from your configured Quartz root.

If user gives wk.aiee.app URL, map to local Quartz markdown first.

Step 2) Generate Script

Read and apply:

  • references/podcast_prompt_template_ko.md

Modes:

  • Full mode: 15~20 minutes
  • Compressed mode: 5~7 minutes (core tips only)

Rules:

  • no system/meta text in spoken lines
  • host intro once at opening only
  • conversational Korean, short sentences, actionable
  • save script in archive/

Step 3) Build Audio (Gemini Multi-Speaker, Reliable)

Preferred: chunked builder (timeout-safe)

# Set API key via environment (required)
export GEMINI_API_KEY="\x3CYOUR_KEY>"

# Run from skills/ai-podcast-pipeline/
python3 scripts/build_dualvoice_audio.py \
  --input \x3Cscript.txt> \
  --outdir \x3Coutdir> \
  --basename podcast_full_dualvoice \
  --chunk-lines 6

Single-pass (short scripts)

python3 scripts/gemini_multispeaker_tts.py \
  --input-file \x3Cdialogue.txt> \
  --outdir \x3Coutdir> \
  --basename podcast_dualvoice \
  --retries 3 \
  --timeout-seconds 120

Default voice mapping (2026-02-10 fixed):

  • Callie (female) → Kore
  • Nick (male) → Puck

Output: MP3 (default delivery format)

Step 4) Build Korean Subtitles (Full Text)

Use full-text subtitle builder (no ... truncation):

python3 scripts/build_korean_srt.py \
  --script \x3Cscript.txt> \
  --audio \x3Cfinal.mp3> \
  --output \x3Coutdir>/podcast.srt \
  --max-chars 22

Step 5) Render Subtitled MP4 (Font + Timing)

Use renderer with adjustable font and timing shift:

python3 scripts/render_subtitled_video.py \
  --image \x3Cthumbnail.png> \
  --audio \x3Cfinal.mp3> \
  --srt \x3Cpodcast.srt> \
  --output \x3Coutdir>/final.mp4 \
  --font-name "Do Hyeon" \
  --font-size 27 \
  --shift-ms -250

Notes:

  • shift-ms negative = subtitle earlier (for lag fixes)
  • If text clipping occurs, lower font-size (e.g., 25~27)
  • keep text inside safe area; avoid overlap with character/object

Step 6) Build Thumbnail + YouTube Metadata

# Set API key via environment (required)
export GEMINI_API_KEY="\x3CYOUR_KEY>"

python3 scripts/build_podcast_assets.py \
  --source "\x3CQuickView path or URL>"

Reference (layout/copy guardrails):

  • references/thumbnail_guidelines_ko.md

Step 7) Final Delivery Checklist

Always include:

  1. source used
  2. final MP3 path
  3. subtitle MP4 path + size
  4. thumbnail path
  5. YouTube title options (3)
  6. YouTube description

Reliability Rules

  • Gemini timeout on long input: use chunked builder (build_dualvoice_audio.py)
  • Subtitle clipping: reduce font size and increase bottom margin
  • Subtitle lag: adjust --shift-ms (usually -150 to -300)
  • Keep generated assets under Telegram practical limits

Security Notes

  • API keys must be passed via environment variables (GEMINI_API_KEY), not hardcoded.
  • Never paste raw keys into prompts, logs, screenshots, or public posts.
  • Recent hardening: thumbnail generation now passes keys via env (not CLI args).

References

  • references/podcast_prompt_template_ko.md
  • references/workflow_runbook.md
  • references/thumbnail_guidelines_ko.md
安全使用建议
Before installing, be aware of these issues and take steps to reduce risk: 1) Credentials: the scripts require GEMINI_API_KEY (or NANO_BANANA_KEY) even though the skill metadata lists no env vars — provide an API key via environment variables only (as the SKILL.md advises) and do not paste keys into prompts or logs. Understand that the same key is used for both TTS and thumbnail generation. 2) Missing declared dependencies: the package invokes external binaries (ffmpeg, ffprobe), Python libraries (Pillow), and a 'uv' runner to call a separate nano-banana-pro script; those binaries and the font assets are not declared in metadata. Ensure you have ffmpeg/ffprobe installed, Python and packages available, and verify what 'uv run' refers to on your system before running. 3) Local-path assumptions: the code expects a QUARTZ_ROOT path and attempts to map HTTPS QuickView URLs to local markdown under that root; confirm that the default path is appropriate for your environment or set QUARTZ_ROOT to a safe directory. The build_podcast_assets script also expects WORKSPACE_DIR/skills/nano-banana-pro and youtube-editor fonts to exist — verify those paths or pass --no-image if you want to skip image generation. 4) Cross-skill invocation: the thumbnail step calls another skill/script (nano-banana-pro). Verify you trust that other code before letting this skill run it. 5) Audit recommended: because metadata is incomplete and the skill will send data to Google Gemini (network requests), review the included scripts yourself (or run in an isolated/test environment) to confirm they meet your privacy/security requirements. If you plan to use a real Gemini API key, consider quota/cost and rotate keys if you suspect misuse. Taken together these mismatches (undisclosed env vars and undeclared binary/config dependencies, plus cross-skill execution) make the package suspicious rather than plainly benign; resolve or validate these points before use.
功能分析
Type: OpenClaw Skill Name: ai-podcast-pipeline Version: 0.1.5 The skill bundle is designed for creating AI podcast packages, utilizing Google Gemini for TTS and ffmpeg for audio/video processing. All code aligns with the stated purpose, and explicit security measures are in place, such as reading API keys from environment variables (`GEMINI_API_KEY`) and blocking insecure HTTP URLs in `scripts/build_podcast_assets.py`. The use of `subprocess.run` is extensive but appears to handle inputs carefully, with `ff_escape` for ffmpeg filter strings and internal path construction. There is no evidence of data exfiltration, persistence mechanisms, or malicious prompt injection attempts in `SKILL.md` or other documentation. The security notices and hardening efforts indicate a focus on transparency and secure practices.
能力评估
Purpose & Capability
The code and SKILL.md implement Korean dual-voice TTS, subtitle generation, thumbnail composition, and packaging — consistent with the stated purpose. However the registry metadata declares no required environment variables, no config paths, and no required binaries, while the code clearly expects a GEMINI_API_KEY (or NANO_BANANA_KEY), a QUARTZ_ROOT local path, and other workspace files (e.g., skills/nano-banana-pro/scripts/generate_image.py, youtube-editor fonts). This missing declaration is an incoherence (the skill will require keys, fonts, and external scripts to function).
Instruction Scope
Runtime instructions are focused on the stated workflow and reference the included scripts. They instruct reading local QuickView markdown, using GEMINI TTS, invoking ffmpeg/ffprobe, and calling a separate 'nano-banana-pro' image generator via 'uv run'. These actions are within the skill's purpose, but the instructions assume access to local workspace paths and another skill's script (cross-skill invocation), and they instruct supplying API keys via environment variables even though the registry metadata didn't list them.
Install Mechanism
There is no install spec (instruction-only with included scripts). No remote downloads or archive extraction are present in the manifest, which keeps install risk low. The scripts will be written to disk as part of the skill bundle, but nothing is fetched from unknown URLs during install.
Credentials
The code legitimately needs one API credential (GEMINI_API_KEY or NANO_BANANA_KEY) for Gemini TTS/image generation and it reads environment variables. That is proportionate to the stated functionality, but the metadata declares no required env vars. The skill also implicitly requires local configuration (QUARTZ_ROOT) and access to other workspace skill scripts and font assets — these config path requirements were not declared. The absence of declared credentials/configs is misleading and could cause accidental key exposure if users are not alerted.
Persistence & Privilege
The skill does not request always: true and does not modify other skills' configurations. It writes outputs to workspace/media directories and calls other scripts, but it does not request privileged/system-wide persistence. Autonomous invocation is enabled by default (normal for skills) and is not in itself a disqualifier.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ai-podcast-pipeline
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ai-podcast-pipeline 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.5
Security: Added notice for VirusTotal false positives
v0.1.4
Security: Remove .env refs, block http://, use relative paths only
v0.1.1
Security hardening: API key via env (not CLI), subprocess timeouts, no hardcoded paths, safe file-size display, removed sensitive examples from docs.
v0.1.0
Initial release: Korean dual-host podcast package pipeline (script, Gemini multi-speaker TTS, subtitle MP4, thumbnail, YouTube metadata).
元数据
Slug ai-podcast-pipeline
版本 0.1.5
许可证
累计安装 1
当前安装数 1
历史版本数 4
常见问题

Ai Podcast Pipeline 是什么?

Create Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie × Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1382 次。

如何安装 Ai Podcast Pipeline?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ai-podcast-pipeline」即可一键安装,无需额外配置。

Ai Podcast Pipeline 是免费的吗?

是的,Ai Podcast Pipeline 完全免费(开源免费),可自由下载、安装和使用。

Ai Podcast Pipeline 支持哪些平台?

Ai Podcast Pipeline 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Ai Podcast Pipeline?

由 jeong-wooseok(@jeong-wooseok)开发并维护,当前版本 v0.1.5。

💬 留言讨论