功能描述

Extract subtitles from YouTube or Bilibili, save raw transcript, and generate a structured summary. Triggers: ANY URL containing youtube.com, youtu.be, bilib...

使用说明 (SKILL.md)

Video Subtitle Extractor (YouTube + Bilibili)

Name: Youtube and Bilibili Subtitle Extraction and Summary
Author: xiapuyang

Detect platform → download subtitles → clean → save raw → generate summary.

Step 1 — Ensure yt-dlp is available

if ! command -v yt-dlp &>/dev/null; then
  echo "yt-dlp not found, installing..."
  pip install -q yt-dlp || pip3 install -q yt-dlp
fi
yt-dlp -U --quiet 2>/dev/null || true

If installation fails, stop and tell the user to install yt-dlp manually (pip install yt-dlp or brew install yt-dlp).

Step 2 — Detect platform and download subtitles

Detect whether the URL is Bilibili or YouTube, then use the appropriate strategy.

URL="\x3Cuser-provided URL>"
TMPDIR=$(mktemp -d)
SUB_FILE=""
SUBTITLE_LANG=""

# Detect platform
if echo "$URL" | grep -qE '(bilibili\.com|b23\.tv)'; then
  PLATFORM="bilibili"
  SITE_NAME="Bilibili"
  SITE_DOMAIN="bilibili.com"
else
  PLATFORM="youtube"
  SITE_NAME="YouTube"
  SITE_DOMAIN="youtube.com"
fi

Bilibili branch

Bilibili subtitles require login cookies. Always use a cookies file — refresh from Chrome if missing or stale (>30 days):

if [ "$PLATFORM" = "bilibili" ]; then
  BILI_COOKIES="${BILIBILI_COOKIES_FILE:-$HOME/bilibili_cookies.txt}"

  NEED_REFRESH=false
  if [ ! -f "$BILI_COOKIES" ]; then
    NEED_REFRESH=true
  elif [ "$(find "$BILI_COOKIES" -mtime +30 2>/dev/null | wc -l | tr -d ' ')" -gt 0 ]; then
    echo "Bilibili cookies older than 30 days, refreshing..."
    NEED_REFRESH=true
  fi

  if [ "$NEED_REFRESH" = true ]; then
    echo "Reading cookies from Chrome (one-time keychain prompt)..."
    yt-dlp --cookies-from-browser chrome --cookies "$BILI_COOKIES" \
      --skip-download -i "https://www.bilibili.com/" 2>/dev/null
  fi

  COOKIE_ARGS="--cookies $BILI_COOKIES"

  # List available subtitle langs — capture stderr to detect login failure
  LIST_OUTPUT=$(yt-dlp --list-subs $COOKIE_ARGS "$URL" 2>&1)
  if echo "$LIST_OUTPUT" | grep -qi "login\|not logged\|需要登录\|please log"; then
    echo ""
    echo "❌ Bilibili cookies expired or invalid."
    echo "   Fix: delete the cookies file and retry — it will re-read from Chrome."
    echo "   rm \"$BILI_COOKIES\""
    rm -rf "$TMPDIR"
    exit 1
  fi
  AVAIL_LANGS=$(echo "$LIST_OUTPUT" | awk '/^[a-z]/{print $1}' | grep -v "^Language$")

  # Try ai-zh first, then any zh variant, then en
  for lang in ai-zh zh-Hans zh-CN zh en; do
    if echo "$AVAIL_LANGS" | grep -q "^${lang}$"; then
      yt-dlp \
        --write-sub \
        --sub-langs "$lang" \
        --skip-download \
        --retries 3 \
        -o "$TMPDIR/bili_%(id)s" \
        $COOKIE_ARGS \
        "$URL" 2>/dev/null
      SUB_FILE=$(ls "$TMPDIR"/*.${lang}.* 2>/dev/null | head -1)
      if [ -n "$SUB_FILE" ]; then
        SUBTITLE_LANG="$lang"
        break
      fi
    fi
  done
fi

YouTube branch

if [ "$PLATFORM" = "youtube" ]; then
  for lang in zh-Hans zh-CN zh en; do
    yt-dlp \
      --write-subs \
      --write-auto-subs \
      --sub-langs "$lang" \
      --skip-download \
      --sub-format vtt \
      --retries 3 \
      --sleep-requests 1 \
      -o "$TMPDIR/yt_%(id)s" \
      "$URL" 2>/dev/null
    SUB_FILE=$(ls "$TMPDIR"/*.${lang}.vtt 2>/dev/null | head -1)
    if [ -n "$SUB_FILE" ]; then
      SUBTITLE_LANG="$lang"
      break
    fi
    sleep 1
  done
fi

Fail if no subtitles

if [ -z "$SUB_FILE" ]; then
  echo "No subtitles found for this video."
  echo "  - No manually uploaded subtitles"
  echo "  - No auto-generated subtitles"
  echo "Cannot proceed without a transcript."
  rm -rf "$TMPDIR"
  exit 1
fi

Step 3 — Clean subtitle file → plain text

Detect format (SRT vs VTT) and clean accordingly:

EXT="${SUB_FILE##*.}"

if [ "$EXT" = "srt" ]; then
  # SRT: remove sequence numbers, timestamps, HTML tags, deduplicate
  grep -v "^[0-9]*$" "$SUB_FILE" \
    | grep -v "^[0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9]* --> " \
    | sed 's/\x3C[^>]*>//g' \
    | grep -v "^$" \
    | python3 -c "
import sys, html
seen = set()
for line in sys.stdin:
    line = html.unescape(line).strip()
    if line and line not in seen:
        seen.add(line)
        print(line)
        print()
" > "$TMPDIR/cleaned.txt"
else
  # VTT
  sed 's/\x3C[^>]*>//g' "$SUB_FILE" \
    | grep -v "^WEBVTT" \
    | grep -v "^NOTE" \
    | grep -v "^Kind:" \
    | grep -v "^Language:" \
    | grep -v "^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]" \
    | grep -v "^$" \
    | python3 -c "
import sys, html
seen = set()
for line in sys.stdin:
    line = html.unescape(line).strip()
    if line and line not in seen:
        seen.add(line)
        print(line)
        print()
" > "$TMPDIR/cleaned.txt"
fi

Step 4 — Resolve output directory and set filename

OUTPUT_DIR="${YOUTUBE_SUBTITLES_DIR:-.}"
mkdir -p "$OUTPUT_DIR"

Use the original video title as the filename. Only strip characters illegal on macOS (/ and ASCII :); preserve all other characters including fullwidth punctuation (：、《》、、). Truncate to 100 chars:

SLUG=$(echo "\x3Ctitle>" | python3 -c "
import sys
title = sys.stdin.read().strip()
title = title.replace('/', '').replace(':', '')
print(title[:100])
")

Step 5 — Fetch video metadata

Use $COOKIE_ARGS (set in Step 2; empty string for YouTube, --cookies FILE for Bilibili):

yt-dlp --dump-json --no-playlist $COOKIE_ARGS "$URL" 2>/dev/null \
  | python3 -c "
import json, sys
d = json.load(sys.stdin)
desc = d.get('description','')
first_para = desc.split('\
\
')[0].replace('\
',' ')[:300]
chapters = d.get('chapters') or []
chapter_lines = '\
'.join(f'  - \"{int(c[\"start_time\"]//60)}:{int(c[\"start_time\"]%60):02d} {c[\"title\"]}\"' for c in chapters)
cats = d.get('categories') or []
print('TITLE:', d.get('title',''))
print('CHANNEL:', d.get('uploader',''))
print('DURATION:', d.get('duration_string',''))
print('DATE:', d.get('upload_date',''))
print('DESCRIPTION:', first_para)
print('CATEGORY:', cats[0] if cats else '')
print('CHAPTERS:')
print(chapter_lines)
"

Step 6 — Save raw transcript

Write $OUTPUT_DIR/$SLUG.md:

NOW=$(date +"%Y-%m-%dT%H:%M")
WORDS=$(wc -w \x3C "$TMPDIR/cleaned.txt" | tr -d ' ')

---
title: "\x3Ctitle>"
source: "\x3CURL>"
author:
  - "[[\x3Cchannel>]]"
published: "\x3CYYYYMMDD>"
description: "\x3CDESCRIPTION>"
tags:
  - "\x3CPLATFORM>"
ctime: "\x3CNOW>"
mtime: "\x3CNOW>"
words: "\x3CWORDS>"
site: "\x3CSITE_NAME>"
domain: "\x3CSITE_DOMAIN>"
channel: "\x3Cchannel>"
duration: "\x3Cduration>"
category: "\x3CCATEGORY>"
subtitle_lang: "\x3CSUBTITLE_LANG>"
chapters:
\x3CCHAPTERS or empty>
type: "source"
---

\x3Cfull cleaned transcript>

Step 7 — Check length and prepare summary input

CHARS=$(wc -c \x3C "$TMPDIR/cleaned.txt" | tr -d ' ')

≤ 120000: use full cleaned text as summary input directly.
> 120000: run map-reduce first (Step 8), then use the combined bullet points as summary input.

Step 8 — Map-reduce for long transcripts (> 120k only)

Split into ~10,000 character chunks:

python3 - \x3C\x3C'EOF'
text = open("$TMPDIR/cleaned.txt").read()
size = 10000
chunks = [text[i:i+size] for i in range(0, len(text), size)]
for i, chunk in enumerate(chunks):
    print(f"=== CHUNK {i+1}/{len(chunks)} ===")
    print(chunk)
EOF

For each chunk, extract structured notes in this format:

TOPIC: [inferred topic name for this chunk, e.g. "AI bubble assessment"]
- [key claim or fact — 2 sentences: what was said + supporting detail]
- [key claim or fact — 2 sentences]
...
QUOTES: [1–3 verbatim lines worth preserving]
DATA: [any specific numbers or metrics]

Extract 8–12 entries per chunk. Collect all structured notes, grouped by TOPIC, as the summary input for Step 9. Merge notes under the same topic across chunks before passing to Step 9.

Step 9 — Generate summary

Resolve the summary language:

if [ -n "$YOUTUBE_SUBTITLES_SUMMARY_LANG" ]; then
  SUMMARY_LANG="$YOUTUBE_SUBTITLES_SUMMARY_LANG"
elif [[ "$SUBTITLE_LANG" == zh* ]]; then
  SUMMARY_LANG="zh"
else
  _SYS_LANG="${LANG:-${LANGUAGE:-}}"
  case "$_SYS_LANG" in
    zh*) SUMMARY_LANG="zh" ;;
    *)   SUMMARY_LANG="en" ;;
  esac
fi

Using the summary input from Step 7 or 8, generate the summary body (all sections) and write it to a temp file first:

cat > "$TMPDIR/summary_body.md" \x3C\x3C'SUMMARY_EOF'
\x3Cgenerated summary sections here>
SUMMARY_EOF
SUMMARY_WORDS=$(wc -w \x3C "$TMPDIR/summary_body.md" | tr -d ' ')

Then write the final file $OUTPUT_DIR/$SLUG-summary.md in $SUMMARY_LANG by combining frontmatter + body:

cat > "$OUTPUT_DIR/$SLUG-summary.md" \x3C\x3CEOF
\x3Cfrontmatter here>
EOF
cat "$TMPDIR/summary_body.md" >> "$OUTPUT_DIR/$SLUG-summary.md"

Frontmatter template:

---
title: "\x3Ctitle>"
source: "\x3CURL>"
author:
published: "\x3CYYYYMMDD>"
description: "\x3CDESCRIPTION>"
tags:
  - "\x3CPLATFORM>"
ctime: "\x3CNOW>"
mtime: "\x3CNOW>"
words: "\x3CSUMMARY_WORDS>"
site: "\x3CSITE_NAME>"
domain: "\x3CSITE_DOMAIN>"
channel: "\x3Cchannel>"
duration: "\x3Cduration>"
category: "\x3CCATEGORY>"
subtitle_lang: "\x3CSUBTITLE_LANG>"
chapters:
\x3CCHAPTERS or empty>
type: "source"
lang: "\x3CSUMMARY_LANG>"
---

Section headers must match the summary language. Use the headers for the appropriate language:

English (en): ## Overall Summary / ## Topic Chapters / ## Key Quotes / ## Novel Ideas / ## Counter-intuitive Views / ## Core Tensions / ## Methodology / ## Key Data

Chinese (zh): ## 总体摘要 / ## 话题章节 / ## 关键引用 / ## 新颖观点 / ## 反直觉观点 / ## 核心张力 / ## 方法论 / ## 关键数据

Voice and attribution

Write content directly. Never prefix bullets or sentences with attribution phrases like "X says", "X believes", "X points out", "X argues", "according to X", or their equivalents in any language. The speaker is already identified in the frontmatter — repeating their name before every claim is noise. State the idea itself.

Bad: "The speaker believes API businesses have no moat because users have zero loyalty." Good: "API businesses have no moat — users switch instantly to any cheaper or better model."

Depth requirement (applies to ALL bullet points across ALL sections)

Every bullet point must be 2–3 sentences minimum:

Claim: state the specific view or fact
Reasoning / evidence: what reasoning, example, or data supports it (quote the transcript if helpful)
Nuance or implication: a caveat, consequence, or "so what" the reader needs

One-sentence bullets are not acceptable. If you cannot write 2–3 sentences about a point, the point is too thin to include — omit it rather than padding.

Content rules per section

Overall Summary / 总体摘要: 3–5 sentence synthesis of the full content, followed by a "最值得关注的几个点 / Key Highlights:" bullet list (3–5 items). Each highlight: 2–3 sentences, specific and non-generic, representing the most surprising or high-value insight a busy reader needs.
Topic Chapters / 话题章节: Identify 4–8 major topics discussed. For each, write a ### [Topic Name] subsection with 3–5 bullets at the 2–3 sentence depth. Topics should reflect the actual conversation flow (e.g. "OpenAI strategy", "AI bubble assessment", "China vs US", "investment philosophy"). Cover all significant topics — do not drop a topic because it seems minor. Even brief or personal moments (a TV show analogy, a hobby mentioned) deserve a short subsection if they carry a meaningful idea.
Key Quotes / 关键引用: Verbatim quotes from the transcript as blockquotes. Aim for 10–15 quotes; prefer ones that are vivid, specific, or capture a stance in the speaker's own words. After each blockquote, add one sentence of context (who said it, in what context).
Novel Ideas / 新颖观点: Ideas that are fresh, uncommon, or reframe a familiar concept. Each bullet: state the idea (2–3 sentences including the reasoning behind it). Do not include ideas that are already mainstream.
Counter-intuitive Views / 反直觉观点: Claims that contradict common belief. Format each bullet as: Common belief: [X] → Actual claim: [Y] — then 1–2 sentences explaining what makes Y non-obvious and what evidence supports it.
Core Tensions / 核心张力: Opposing forces, unresolved debates, or structural contradictions. Each tension: name both sides (2–3 sentences each), then note whether the speaker resolves it or leaves it open.
Methodology / 方法论: Frameworks, decision processes, heuristics, or step-by-step approaches. Each entry: describe the framework (what it is, how to apply it, what problem it solves). Sub-bullets are encouraged for multi-step processes.
Key Data / 关键数据: Specific numbers, statistics, metrics, named comparisons. Include the source context (who said it, about what). Do not fabricate numbers — only include figures explicitly stated in the transcript.

Omit any section where the source has no relevant content. Do not fabricate.

Step 10 — Clean up and report

rm -rf "$TMPDIR"

Raw:     $OUTPUT_DIR/$SLUG.md
Summary: $OUTPUT_DIR/$SLUG-summary.md

安全使用建议

Review carefully before installing. The YouTube subtitle workflow is relatively ordinary, but the Bilibili workflow may read Chrome login cookies and save them under your home directory. Only use it if you are comfortable granting the agent access to your Bilibili browser session, and consider manually providing a scoped cookie file instead of allowing automatic browser-cookie extraction.

功能分析

Type: OpenClaw Skill Name: youtube-subtitle-summary Version: 1.0.0 The skill includes instructions in SKILL.md to automatically extract browser cookies from Chrome using 'yt-dlp --cookies-from-browser' and save them to a local file (~/bilibili_cookies.txt). While this is a functional feature of yt-dlp used to access Bilibili subtitles, it involves accessing sensitive authentication data from the user's system. Additionally, the skill performs automated software installation and updates (pip install yt-dlp, yt-dlp -U) directly within the agent's execution flow.

能力评估

⚠ Purpose & Capability

Extracting subtitles and summarizing videos is coherent with the stated purpose, but the Bilibili path requires authenticated browser cookies, which is high-impact account/session access.

ℹ Instruction Scope

The skill broadly tells the agent to trigger immediately for any YouTube/Bilibili URL and not use another fetch method; this is mostly purpose-aligned routing but broad.

ℹ Install Mechanism

Although there is no install spec and no required binaries declared, the skill instructs installing and updating yt-dlp at runtime. This is relevant to the purpose but should be visible to users.

⚠ Credentials

The skill can read cookies from the local Chrome profile and store them in a home-directory cookie file for Bilibili, which is more sensitive than ordinary subtitle extraction.

⚠ Persistence & Privilege

The Bilibili cookie file is persisted at $HOME/bilibili_cookies.txt by default and refreshed when missing or older than 30 days, creating ongoing session material on disk.

版本历史

v1.0.0

youtube-summary v1.0.0 - Initial release: Extracts subtitles from YouTube and Bilibili URLs or via key phrases. - Supports authentication for Bilibili subtitle downloads using browser cookies. - Cleans and saves the raw transcript in a structured Markdown file with video metadata. - Generates a summary from the transcript, using map-reduce for long videos. - Triggers automatically on video URLs or requests like "summarize this video" and similar phrases.

元数据

Slug youtube-subtitle-summary

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Youtube and Bilibili Subtitle Extraction and Summary 是什么？

Extract subtitles from YouTube or Bilibili, save raw transcript, and generate a structured summary. Triggers: ANY URL containing youtube.com, youtu.be, bilib... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 66 次。

如何安装 Youtube and Bilibili Subtitle Extraction and Summary？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install youtube-subtitle-summary」即可一键安装，无需额外配置。

Youtube and Bilibili Subtitle Extraction and Summary 是免费的吗？

是的，Youtube and Bilibili Subtitle Extraction and Summary 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Youtube and Bilibili Subtitle Extraction and Summary 支持哪些平台？

Youtube and Bilibili Subtitle Extraction and Summary 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Youtube and Bilibili Subtitle Extraction and Summary？

由 Xiapu Yang（@xiapuyang）开发并维护，当前版本 v1.0.0。

Youtube and Bilibili Subtitle Extraction and Summary