← 返回 Skills 市场
cyberkurry

Video Metadata Intelligence System

作者 CyberKurry · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
37
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install video-metadata-analyzer
功能描述
Drop in a video file and get back a complete Bilibili publishing package: title, intro, tags, category, cover suggestion, and content declaration, all generated automatically from visual and audio analysis.
使用说明 (SKILL.md)

Video Analyzer

Three-stage video analysis pipeline: parallel visual + audio observation, then metadata synthesis for Bilibili publishing.

When to Use

  • User says "analyze this video", "视频分析", "提取视频信息", "生成投稿元数据"
  • User wants structured content analysis from a video file
  • User wants to prepare a video for Bilibili publishing (title, intro, tags, category, cover)
  • Upstream of bilibili-publish-playwright: this skill generates the metadata that feeds into Bilibili publishing

Architecture

Input Video ──────────────────────────────────────────────────────
    │                                                            │
    ├── Stage 1a: visual.py ──→ observations_visual.json          │
    │      (ffmpeg extract frames → encode → vision LLM observe)  │ PARALLEL
    │                                                            │
    ├── Stage 1b: transcribe.py ──→ observations_audio.json       │
    │      (ffmpeg extract audio → transcribe + structure)         │
    │                                                            │
    └── Stage 2: analyze.py ──→ metadata.json ←───────────────────┘
           (merge V+A observations → publishable metadata via LLM)

run.sh orchestrates: launches visual.py and transcribe.py as background processes (&), wait for both, then optionally runs analyze.py.

Output Directory

$OUTPUT/
├── observations_visual.json    # JSON array: one object per frame
├── observations_audio.json     # JSON object: transcript + structured info
├── metadata.json               # (optional) Synthesized Bilibili metadata
└── frames/                     # (only with --keep-frames, auto-cleaned otherwise)

Procedure

1. Full pipeline (recommended — all external LLM API)

bash scripts/run.sh \
  --video VIDEO_PATH --output /tmp/va-out \
  --transcribe audio-llm \
  --audio-llm-key KEY --audio-llm-base URL --audio-llm-model MODEL \
  --vision-llm-key KEY --vision-llm-base URL --vision-llm-model MODEL \
  --max-frames 15 \
  --synthesize-method api \
  --analyze-llm-key KEY --analyze-llm-base URL --analyze-llm-model MODEL

2. Agent-direct mode (no external API — agent reads frames/audio directly)

bash scripts/run.sh \
  --video VIDEO_PATH --output /tmp/va-out --keep-frames

Agent then reads observations_visual.json (placeholder frames), observations_audio.json (audio file path), and optionally the frame images + audio file directly to generate metadata.

3. Mixed / observe-only

Omit --synthesize-method to observe only, then run analyze.py separately later. Each stage (visual, audio, synthesize) can use different keys and models.

Key Parameters

Parameter Default Purpose
--video PATH Required. Input video file
--output DIR Required. Output directory
--transcribe MODE agent-direct local / cloud / agent-direct / audio-llm
--max-frames N 15 Max frames per 4-min segment
--keep-frames false Keep extracted frame images
--synthesize-method METHOD api / agent / manual. Omit = observe only

All *-key, *-base, *-model parameters follow the pattern: --vision-llm-key, --audio-llm-key, --analyze-llm-key etc. See references/REFERENCE.md for the complete parameter table.

Scripts

File Role
scripts/common.py Shared utilities: HTTP retry with backoff, media duration via ffprobe, JSON parse from LLM output
scripts/visual.py Frame extraction (auto-segment, auto-compress >200KB) + vision LLM observation. Long videos: segments processed in parallel (max 4 concurrent)
scripts/transcribe.py Audio extraction + transcription (4 modes). Auto-chunks large audio with 2s overlap for dedup
scripts/analyze.py Observations → publish metadata (3 methods: api/agent/manual). Heuristic fallback on API failure
scripts/run.sh Orchestrator: parallel visual+audio, then optional synthesis

Output Summary

observations_visual.json — JSON array, one object per frame with frame, objects, desc, texts, actions, style, cover_candidate, segment, segment_start.

observations_audio.jsontranscript, speakers, key_points, tone. Agent-direct mode includes audio_file path.

metadata.jsontitle (≤80 chars), intro (≤2000 chars), tags (≤10), category, sub_category, cover_suggestion (primary + reason + secondary), declaration, copyright_claim.

Pitfalls

  • API keys in chat: Some platforms truncate keys with . Always pass keys via command-line arguments, not through messages.
  • Model capability: Vision requires image_url support. Audio-LLM requires input_audio support. Check your provider.
  • Game recordings: Frames are large (~300-380KB vs ~30-80KB for phone). Auto-compression handles this, but plan rate limits for long videos.
  • Long videos = parallel API calls: 30-min video = 8 segments × 15 frames = 8 vision API calls (capped at 4 concurrent). Consider rate limits.
  • Missing credentials auto-degrade: Omitting LLM keys → preprocess-only or agent-direct mode. Scripts never crash on missing keys.
  • --interval deprecated: Ignored. Interval auto-calculated per segment based on --max-frames.

Error Handling

Three-layer defense:

  1. HTTP retry — 3 retries with exponential backoff on 5xx / connection errors
  2. JSON parse retry — 3 attempts with error feedback sent back to LLM
  3. Graceful degradation — placeholder observations on visual failure, raw text on audio failure, heuristic fallback on synthesis failure

Verification

  1. Check exit code: run.sh returns 0 on success
  2. Verify observations_visual.json has entries for expected frame count
  3. Verify observations_audio.json has transcript field (non-empty for speech videos)
  4. If --synthesize-method used, verify metadata.json has all required fields (title, intro, tags, category, cover_suggestion)

For complete parameter reference, output schemas, standalone usage per script, and detailed error handling, see references/REFERENCE.md.

安全使用建议
Do not treat this as a completed security review. The workspace files need to be readable before making an installation decision.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Workspace command execution failed before files could be read; no artifact-backed purpose or capability assessment was possible.
Instruction Scope
SKILL.md could not be inspected, so instruction scope could not be verified from artifacts.
Install Mechanism
Install metadata and specs could not be inspected from the workspace in this run.
Credentials
No artifact-backed environment requirements were available for review.
Persistence & Privilege
No artifact-backed persistence or privilege behavior was available for review.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install video-metadata-analyzer
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /video-metadata-analyzer 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of the video metadata analyzer skill. - Launches a three-stage video analysis pipeline: frame extraction + vision LLM, audio transcription, and Bilibili-ready metadata synthesis. - Supports multiple modes: full LLM-powered, agent-direct (API-free), and mixed observe-only. - Outputs detailed observations and synthesized metadata, including Bilibili title, intro, tags, category, and cover suggestions. - Robust error handling: retries, fallback methods, and graceful degradation on failures. - Flexible architecture with parallel processing and support for custom LLM/API configuration.
元数据
Slug video-metadata-analyzer
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Video Metadata Intelligence System 是什么?

Drop in a video file and get back a complete Bilibili publishing package: title, intro, tags, category, cover suggestion, and content declaration, all generated automatically from visual and audio analysis. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 37 次。

如何安装 Video Metadata Intelligence System?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install video-metadata-analyzer」即可一键安装,无需额外配置。

Video Metadata Intelligence System 是免费的吗?

是的,Video Metadata Intelligence System 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Video Metadata Intelligence System 支持哪些平台?

Video Metadata Intelligence System 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Video Metadata Intelligence System?

由 CyberKurry(@cyberkurry)开发并维护,当前版本 v1.0.0。

💬 留言讨论