← Back to Skills Marketplace
guorui303

Bilibili Transcriber

by guorui303 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
113
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install bilibili-transcriber
Description
Bilibili视频转文字摘要专家。支持云端(阿里云Paraformer)和本地(faster-whisper)双引擎转录。当用户提供B站视频URL时,自动下载音频、转录成文字、生成结构化摘要。支持BV号和完整URL。
README (SKILL.md)

你是Bilibili视频内容处理专家。你的任务是将B站视频转换为文字并生成高质量摘要。

工作流程

步骤1:解析视频信息

  • 从URL中提取BV号(如 BV1xx411c7mD)
  • 如果用户提供的是短链接,先解析获取完整URL
  • 调用B站API获取视频基本信息(标题、UP主、时长、简介等)

步骤2:获取视频字幕/文字内容

优先方案:获取CC字幕

# 调用B站API检查是否有官方字幕
curl "https://api.bilibili.com/x/player/wbi/v2?cid={cid}&bvid={bvid}"

备选方案A(推荐):阿里云 Paraformer 云端转写 如果视频没有字幕,优先使用云端转写(速度快、方言准、不依赖GPU):

  1. 下载音频

    python -m yt_dlp -f "bestaudio" --extract-audio --audio-format m4a -o "{output_path}.%(ext)s" "{video_url}"
    
  2. 云端转写

    from cloud_transcriber import cloud_transcribe
    
    # 上传音频 → Paraformer 转写 → 返回带时间戳的结果
    segments = cloud_transcribe("audio.m4a")
    for seg in segments:
        print(f"[{seg['start']:.1f}s] {seg['text']}")
    

    需要设置环境变量 DASHSCOPE_API_KEYOPENAI_API_KEY(阿里云百炼 API Key)。 依赖安装:pip install dashscope requests

备选方案B:本地 faster-whisper 转录(离线/无API Key时使用) 如果没有 API Key 或需要离线使用,回退到本地转录:

  1. 下载音频

    python -m yt_dlp -f "bestaudio" --extract-audio --audio-format m4a -o "{output_path}.%(ext)s" "{video_url}"
    
  2. 音频格式处理(使用ffmpeg)

    # 将m4a转换为wav格式(whisper推荐格式)
    ffmpeg -i input.m4a -ar 16000 -ac 1 -c:a pcm_s16le output.wav
    
  3. 语音转文字(faster-whisper + 模型缓存)

    使用预置的 transcriber 模块(推荐):

    from transcriber import transcribe_audio
    
    # 首次调用会加载模型(约2-5秒),后续调用直接使用缓存模型
    text = transcribe_audio("audio.wav", language="zh")
    print(text)
    

    如果需要批量处理多个视频:

    from transcriber import batch_transcribe
    
    audio_files = ["video1.wav", "video2.wav", "video3.wav"]
    results = batch_transcribe(audio_files, language="zh")
    for path, text in results.items():
        print(f"{path}: {text[:100]}...")
    

步骤3:生成结构化摘要

基于转录文本生成以下内容的摘要:

  • 视频标题:原始标题
  • UP主:创作者名称
  • 视频时长:总时长
  • 核心观点(3-5条):视频传达的主要观点
  • 详细摘要(300-500字):按时间线或主题组织的内容概述
  • 关键时间节点:重要内容的时间戳标记(格式:[02:15] 讲解OpenClaw安装步骤
  • 适用人群:这个视频适合谁看

步骤4:输出格式

  • 使用Markdown格式输出
  • 保持条理清晰,层次分明
  • 如果内容较长,保存为.md文件到用户工作目录

transcriber.py 模块说明

该 skill 包含 transcriber.py 模块,提供以下特性:

核心功能

  • 模型单例缓存:首次加载后常驻内存,后续调用零延迟
  • 自动设备检测:优先使用 GPU(CUDA),自动回退到 CPU
  • 量化优化:默认使用 int8 量化,速度提升 4-5 倍
  • 批量处理支持:一次性处理多个文件,只加载一次模型

API 说明

from transcriber import transcribe_audio, batch_transcribe, get_model_info

# 转录单个文件(首次加载模型约2-5秒,后续\x3C100ms)
text = transcribe_audio("audio.wav", language="zh")

# 批量转录(共享模型实例)
results = batch_transcribe(["a.wav", "b.wav"], language="zh")

# 查看模型信息
info = get_model_info()
print(info)

性能对比

方案 首次调用 后续调用 内存占用 准确率
原 whisper 5-10s 5-10s ~1GB
faster-whisper (本方案) 2-5s \x3C100ms ~500MB

依赖安装

首次使用前需要安装依赖:

pip install faster-whisper yt-dlp

# ffmpeg 需要单独安装
# Windows: winget install ffmpeg
# macOS: brew install ffmpeg
# Linux: sudo apt install ffmpeg

工具说明

ffmpeg 用途

  • 格式转换:m4a → wav/mp3
  • 音频处理:调整采样率、声道数
  • 提取音频:从视频文件中提取音轨

常用命令:

# 查看音频信息
ffmpeg -i audio.m4a

# 转换格式(whisper推荐:16kHz, 单声道, 16bit)
ffmpeg -i input.m4a -ar 16000 -ac 1 -c:a pcm_s16le output.wav

# 提取视频音频
ffmpeg -i video.mp4 -vn -acodec copy output.aac

错误处理

  • 如果视频无字幕,自动进入音频转录流程
  • 如果 faster-whisper 未安装,提示用户安装:pip install faster-whisper
  • 如果 ffmpeg 未安装,提供安装指南
  • 模型加载失败时自动清理缓存重试
  • 始终尊重版权,仅用于个人学习和研究

最佳实践

  • 批量处理:如需处理多个视频,使用 batch_transcribe() 函数避免重复加载模型
  • 模型选择:默认使用 "small" 模型(中文效果好),如需更快可使用 "tiny",如需更准可使用 "base"
  • 文件格式:优先使用 wav 格式(16kHz, 单声道),兼容性最好
  • 摘要要准确反映原视频内容,不添加个人偏见
  • 时间节点标记要精确到秒
  • 核心观点要用简洁的语言概括
  • 对于技术教程类视频,要突出关键步骤和命令
Usage Guidance
This skill appears to implement the described Bilibili -> transcription -> summary flow, but take these precautions before installing: - Credential caution: The cloud path requires DASHSCOPE_API_KEY (Paraformer). The code will also accept OPENAI_API_KEY and send it as a Bearer token to dashscope.aliyuncs.com — do not set your OpenAI key as a convenience fallback unless you understand that it will be sent to a third party. - Privacy: Using cloud transcribe uploads your audio to dashscope.aliyuncs.com / OSS temporary storage. If the audio contains sensitive information, prefer the local faster-whisper path. - Model downloads and mirrors: The skill sets HF endpoint to a mirror and can download models via ModelScope/snapshot_download. Only proceed if you trust those hosts (hf-mirror.com / ModelScope / the model owners listed). Consider running in an isolated environment if you are concerned. - Registry mismatch: The registry metadata did not declare required environment variables, but the code and SKILL.md do. Treat the skill as requiring optional cloud credentials and verify any env vars before exporting them. - Operational safety: The skill runs network I/O (calls Bilibili APIs, downloads audio, posts to Paraformer, downloads models). Review the included Python files yourself or run the skill in a sandbox/VM if you have limited trust. If you only want local transcription, do not set DASHSCOPE_API_KEY / OPENAI_API_KEY and ensure faster-whisper, yt-dlp and ffmpeg are installed locally; then the skill will fall back to offline transcription.
Capability Analysis
Type: OpenClaw Skill Name: bilibili-transcriber Version: 1.0.0 The bilibili-transcriber skill bundle is a legitimate tool designed to transcribe Bilibili videos using either local Whisper models or the Alibaba Cloud Paraformer API. The code in fast_transcribe.py, transcriber.py, and cloud_transcriber.py follows standard practices for media processing, including using yt-dlp for downloads and ffmpeg for audio conversion. While it accesses environment variables (DASHSCOPE_API_KEY) and performs network requests to Bilibili and Alibaba Cloud (dashscope.aliyuncs.com), these actions are strictly aligned with its stated purpose, and no evidence of data exfiltration, malicious execution, or harmful prompt injection was found.
Capability Assessment
Purpose & Capability
Name/description match the code: modules implement Bilibili metadata fetch, subtitle fetching, audio download, local faster-whisper transcription, and an optional cloud Paraformer flow. The presence of model download logic, model selection heuristics, and yt-dlp/ffmpeg usage is coherent for the described goal.
Instruction Scope
SKILL.md and the Python modules consistently instruct: check for B站 CC subtitles first, otherwise download audio (yt_dlp), optionally convert with ffmpeg, transcribe locally or upload to cloud Paraformer, then produce a Markdown summary. The instructions reference environment variables (DASHSCOPE_API_KEY, OPENAI_API_KEY) and recommend pip installs; these actions stay within the stated purpose (they do network I/O, file I/O, and model downloads which are expected for transcription).
Install Mechanism
There is no formal install spec in the registry (instruction-only), but included code performs model downloads at runtime (ModelScope/snapshot_download and setting HF mirror), uses pip-installable packages (faster-whisper, yt-dlp, dashscope), and expects ffmpeg installed separately. Runtime model downloads from third-party mirrors (hf-mirror.com / ModelScope) are common for large models but increase trust surface (you should trust those hosts).
Credentials
Registry metadata lists no required env vars, but both SKILL.md and cloud_transcriber.py require/expect DASHSCOPE_API_KEY or OPENAI_API_KEY for cloud Paraformer. The code will accept OPENAI_API_KEY as a fallback and send it as a Bearer token to dashscope.aliyuncs.com — this can cause accidental leakage of an OpenAI key to a third-party service if the user sets that variable. The skill also uploads audio (potentially sensitive) to an external cloud when cloud mode is used; that is functionally coherent but has privacy implications that are not surfaced in the registry metadata.
Persistence & Privilege
The skill is not always-enabled and does not request elevated system-wide privileges. It does cache models in the user's home cache directory (~/.cache/modelscope) and keeps model instances in memory while running (normal for a transcriber). No code modifies other skills or global agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install bilibili-transcriber
  3. After installation, invoke the skill by name or use /bilibili-transcriber
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
bilibili-transcriber 1.0.0 - 首次发布,支持将B站视频音频转录为文字并生成摘要 - 支持阿里云 Paraformer 云转写和本地 faster-whisper 引擎自动切换 - 内置快捷流程:支持BV号、完整/短链接输入,自动下载音频、格式转换、转写与结构化摘要生成 - 输出包含视频标题、UP主、时长、核心观点、详细摘要、关键时间节点、适用人群 - 提供 transcriber.py 模块,支持单文件/批量转录、模型自动缓存及设备优化 - 全流程Markdown输出,并支持长内容本地保存
Metadata
Slug bilibili-transcriber
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Bilibili Transcriber?

Bilibili视频转文字摘要专家。支持云端(阿里云Paraformer)和本地(faster-whisper)双引擎转录。当用户提供B站视频URL时,自动下载音频、转录成文字、生成结构化摘要。支持BV号和完整URL。 It is an AI Agent Skill for Claude Code / OpenClaw, with 113 downloads so far.

How do I install Bilibili Transcriber?

Run "/install bilibili-transcriber" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Bilibili Transcriber free?

Yes, Bilibili Transcriber is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Bilibili Transcriber support?

Bilibili Transcriber is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Bilibili Transcriber?

It is built and maintained by guorui303 (@guorui303); the current version is v1.0.0.

💬 Comments