Youtube Transcript
/install youtube-transcript-skill
YouTube — Transcript Extraction & Content Reformatting
YouTube video URL → timestamped transcript → summary / chapters / thread / blog / quotes
Language
All process output to user (progress updates, process notifications) follows the user's language.
Objective
Extract the full transcript from a YouTube video's built-in transcript panel, then transform it into the output format the user requests.
Prerequisites
- Target YouTube video page is already open in the browser:
https://www.youtube.com/watch?v={VIDEO_ID}
Pre-execution Checks
1. Tool Readiness
If browser-act has been confirmed available in the current session → skip this step.
Invoke browser-act via Skill tool to load usage. If installation or configuration issues arise, follow its guidance to resolve then retry.
Capability Components
This Skill's operational boundary = what the user can manually do in their browser. It only reads data already displayed to the user on the page, never bypassing authentication or access controls. JS code is encapsulated in Python files under the
scripts/directory, invoked viaeval "$(python scripts/xxx.py)". Use the bash tool for execution.
DOM: Check transcript availability and list languages
eval "$(python scripts/get-languages.py)"
No parameters. Reads ytInitialPlayerResponse from the current page.
Output example:
{
"available_languages": [
{"code": "en", "name": "English", "kind": "manual", "is_auto": false},
{"code": "en", "name": "English (auto-generated)", "kind": "asr", "is_auto": true}
],
"count": 2
}
Returns {"error": true, "message": "..."} when transcripts are disabled or page is not a YouTube video.
DOM: Open transcript panel
eval "$(python scripts/open-transcript-panel.py)"
No parameters. Clicks the "Show transcript" button below the video (handles multiple UI language variants automatically for robustness).
Must call wait stable after this to allow the panel to fully load.
Output example:
{"success": true, "label": "Show transcript"}
DOM: Extract all transcript segments
eval "$(python scripts/extract-transcript-segments.py)"
No parameters. Scrolls the open transcript panel to trigger lazy loading for long videos, then extracts all segments.
Output example:
{
"segment_count": 24,
"segments": [
{"ts": "0:18", "text": "We're no strangers to love"},
{"ts": "0:27", "text": "You know the rules and so do I"}
],
"full_text": "We're no strangers to love You know the rules...",
"timestamped_text": "0:18 We're no strangers to love\
0:27 You know the rules..."
}
Composite: Full transcript fetch workflow
navigate https://www.youtube.com/watch?v={VIDEO_ID}→wait stableeval "$(python scripts/get-languages.py)"— confirm transcripts are available; note the language listeval "$(python scripts/open-transcript-panel.py)"— open the panelwait stable— wait for panel content to loadeval "$(python scripts/extract-transcript-segments.py)"— extract all segments
Use timestamped_text from the output as input for the Transform step below.
Transform: Content Reformatting
After fetching the transcript, transform it based on what the user requests. If the user did not specify a format, default to the Full Document — output all five sections in order.
- Summary: Concise 5–10 sentence overview of the entire video
- Chapters: Group by topic shifts, output timestamped chapter list
- Thread: Twitter/X thread format — numbered posts, each under 280 characters
- Blog post: Full article with title, H2 sections per major topic, key quotes, and takeaways
- Quotes: Notable quotes with their timestamps
Default Full Document output order (when no specific format is requested):
- Summary
- Chapters
- Thread
- Blog Post
- Quotes
Workflow
- Fetch transcript using the Composite component above.
- Validate: confirm
segment_count >= 1. If empty, tell the user the video has transcripts disabled. - Chunk if needed: if
full_textexceeds ~50,000 characters, splittimestamped_textinto overlapping chunks (~40K characters with 2K overlap) and summarize each chunk before merging. - Transform into the requested format(s) using the
timestamped_textfield. If no format specified, produce all five sections. - Verify: re-read the output for coherence, correct timestamps (if chapters), and completeness before presenting.
Example — Chapters Output
0:00 Introduction — host opens with the problem statement
3:45 Background — prior work and why existing solutions fall short
12:20 Core method — walkthrough of the proposed approach
24:10 Results — benchmark comparisons and key takeaways
31:55 Q&A — audience questions on scalability and next steps
Example — Thread Output
1/ Just watched an incredible video on [topic]. Key takeaways 🧵
2/ First insight: [point]. This matters because [reason].
3/ The surprising part: [finding]. Most assume [belief], but this shows otherwise.
4/ Practical takeaway: [action].
5/ Full video: [URL]
Error Handling
- Transcripts disabled:
get-languages.pyreturns error; tell user and suggest checking if captions are available on the video page - Private/unavailable video: page will not load correctly; relay the error and ask user to verify the URL
- Transcript button not found: usually means the user is not on a video page, or the page hasn't finished loading; navigate to the URL and retry
- No segments after panel opens: retry
open-transcript-panel.py+wait stable+extract-transcript-segments.pyonce
Known Limitations
- Language selection: the transcript panel shows the language YouTube defaults to for the user's region. Switching to a specific language requires changing the caption language in the player's CC settings first; automatic language switching is not implemented.
- Auto-generated transcripts (kind: asr) may have lower accuracy than manual captions.
- Videos that require login to view will not have a transcript panel accessible.
Execution Efficiency
- Batch orchestration: Write a bash script to loop through video URLs serially within a single session — navigate to each video, run the 3-step composite workflow, save result, then move to the next. Do not parallelize within one browser. To increase throughput for large batches, open multiple stealth browser sessions and distribute URLs across them.
- Test before batch execution: After writing a batch script, first test with 1–2 videos to confirm the full workflow runs correctly; only then run the full batch.
- Reduce redundant pre-operations: Pre-execution checks (tool readiness) only need to run once per session; skip them for subsequent videos in the same batch.
- Error resumption: Save each video's result immediately after extraction; on failure, resume from the failed video rather than starting over.
Success Criteria
segment_count >= 1 AND full_text length > 0
Experience Notes
Path: {working-directory}/browser-act-skill-forge-memories/youtube-content-youtube-transcript.memory.md
Before execution: If the file exists, read it first — it records unexpected situations encountered during past executions (e.g., a strategy has become ineffective); adjust strategy order accordingly.
After execution: If an unexpected situation is encountered (strategy became ineffective, page redesigned, anti-scraping upgraded, better path discovered), append a line:
{YYYY-MM-DD}: {what happened} → {conclusion}
Normal execution does not write to the file.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install youtube-transcript-skill - 安装完成后,直接呼叫该 Skill 的名称或使用
/youtube-transcript-skill触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Youtube Transcript 是什么?
YouTube transcript extraction and content reformatting: given a YouTube video URL, opens the video's transcript panel, extracts all timestamped segments, and... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 41 次。
如何安装 Youtube Transcript?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install youtube-transcript-skill」即可一键安装,无需额外配置。
Youtube Transcript 是免费的吗?
是的,Youtube Transcript 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Youtube Transcript 支持哪些平台?
Youtube Transcript 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Youtube Transcript?
由 browser-act skill(@browseract-cli)开发并维护,当前版本 v1.0.0。