← Back to Skills Marketplace
cyberkurry

Video Metadata Intelligence System

by CyberKurry · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
37
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install video-metadata-analyzer
Description
Drop in a video file and get back a complete Bilibili publishing package: title, intro, tags, category, cover suggestion, and content declaration, all generated automatically from visual and audio analysis.
README (SKILL.md)

Video Analyzer

Three-stage video analysis pipeline: parallel visual + audio observation, then metadata synthesis for Bilibili publishing.

When to Use

  • User says "analyze this video", "视频分析", "提取视频信息", "生成投稿元数据"
  • User wants structured content analysis from a video file
  • User wants to prepare a video for Bilibili publishing (title, intro, tags, category, cover)
  • Upstream of bilibili-publish-playwright: this skill generates the metadata that feeds into Bilibili publishing

Architecture

Input Video ──────────────────────────────────────────────────────
    │                                                            │
    ├── Stage 1a: visual.py ──→ observations_visual.json          │
    │      (ffmpeg extract frames → encode → vision LLM observe)  │ PARALLEL
    │                                                            │
    ├── Stage 1b: transcribe.py ──→ observations_audio.json       │
    │      (ffmpeg extract audio → transcribe + structure)         │
    │                                                            │
    └── Stage 2: analyze.py ──→ metadata.json ←───────────────────┘
           (merge V+A observations → publishable metadata via LLM)

run.sh orchestrates: launches visual.py and transcribe.py as background processes (&), wait for both, then optionally runs analyze.py.

Output Directory

$OUTPUT/
├── observations_visual.json    # JSON array: one object per frame
├── observations_audio.json     # JSON object: transcript + structured info
├── metadata.json               # (optional) Synthesized Bilibili metadata
└── frames/                     # (only with --keep-frames, auto-cleaned otherwise)

Procedure

1. Full pipeline (recommended — all external LLM API)

bash scripts/run.sh \
  --video VIDEO_PATH --output /tmp/va-out \
  --transcribe audio-llm \
  --audio-llm-key KEY --audio-llm-base URL --audio-llm-model MODEL \
  --vision-llm-key KEY --vision-llm-base URL --vision-llm-model MODEL \
  --max-frames 15 \
  --synthesize-method api \
  --analyze-llm-key KEY --analyze-llm-base URL --analyze-llm-model MODEL

2. Agent-direct mode (no external API — agent reads frames/audio directly)

bash scripts/run.sh \
  --video VIDEO_PATH --output /tmp/va-out --keep-frames

Agent then reads observations_visual.json (placeholder frames), observations_audio.json (audio file path), and optionally the frame images + audio file directly to generate metadata.

3. Mixed / observe-only

Omit --synthesize-method to observe only, then run analyze.py separately later. Each stage (visual, audio, synthesize) can use different keys and models.

Key Parameters

Parameter Default Purpose
--video PATH Required. Input video file
--output DIR Required. Output directory
--transcribe MODE agent-direct local / cloud / agent-direct / audio-llm
--max-frames N 15 Max frames per 4-min segment
--keep-frames false Keep extracted frame images
--synthesize-method METHOD api / agent / manual. Omit = observe only

All *-key, *-base, *-model parameters follow the pattern: --vision-llm-key, --audio-llm-key, --analyze-llm-key etc. See references/REFERENCE.md for the complete parameter table.

Scripts

File Role
scripts/common.py Shared utilities: HTTP retry with backoff, media duration via ffprobe, JSON parse from LLM output
scripts/visual.py Frame extraction (auto-segment, auto-compress >200KB) + vision LLM observation. Long videos: segments processed in parallel (max 4 concurrent)
scripts/transcribe.py Audio extraction + transcription (4 modes). Auto-chunks large audio with 2s overlap for dedup
scripts/analyze.py Observations → publish metadata (3 methods: api/agent/manual). Heuristic fallback on API failure
scripts/run.sh Orchestrator: parallel visual+audio, then optional synthesis

Output Summary

observations_visual.json — JSON array, one object per frame with frame, objects, desc, texts, actions, style, cover_candidate, segment, segment_start.

observations_audio.jsontranscript, speakers, key_points, tone. Agent-direct mode includes audio_file path.

metadata.jsontitle (≤80 chars), intro (≤2000 chars), tags (≤10), category, sub_category, cover_suggestion (primary + reason + secondary), declaration, copyright_claim.

Pitfalls

  • API keys in chat: Some platforms truncate keys with . Always pass keys via command-line arguments, not through messages.
  • Model capability: Vision requires image_url support. Audio-LLM requires input_audio support. Check your provider.
  • Game recordings: Frames are large (~300-380KB vs ~30-80KB for phone). Auto-compression handles this, but plan rate limits for long videos.
  • Long videos = parallel API calls: 30-min video = 8 segments × 15 frames = 8 vision API calls (capped at 4 concurrent). Consider rate limits.
  • Missing credentials auto-degrade: Omitting LLM keys → preprocess-only or agent-direct mode. Scripts never crash on missing keys.
  • --interval deprecated: Ignored. Interval auto-calculated per segment based on --max-frames.

Error Handling

Three-layer defense:

  1. HTTP retry — 3 retries with exponential backoff on 5xx / connection errors
  2. JSON parse retry — 3 attempts with error feedback sent back to LLM
  3. Graceful degradation — placeholder observations on visual failure, raw text on audio failure, heuristic fallback on synthesis failure

Verification

  1. Check exit code: run.sh returns 0 on success
  2. Verify observations_visual.json has entries for expected frame count
  3. Verify observations_audio.json has transcript field (non-empty for speech videos)
  4. If --synthesize-method used, verify metadata.json has all required fields (title, intro, tags, category, cover_suggestion)

For complete parameter reference, output schemas, standalone usage per script, and detailed error handling, see references/REFERENCE.md.

Usage Guidance
Do not treat this as a completed security review. The workspace files need to be readable before making an installation decision.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
Workspace command execution failed before files could be read; no artifact-backed purpose or capability assessment was possible.
Instruction Scope
SKILL.md could not be inspected, so instruction scope could not be verified from artifacts.
Install Mechanism
Install metadata and specs could not be inspected from the workspace in this run.
Credentials
No artifact-backed environment requirements were available for review.
Persistence & Privilege
No artifact-backed persistence or privilege behavior was available for review.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install video-metadata-analyzer
  3. After installation, invoke the skill by name or use /video-metadata-analyzer
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of the video metadata analyzer skill. - Launches a three-stage video analysis pipeline: frame extraction + vision LLM, audio transcription, and Bilibili-ready metadata synthesis. - Supports multiple modes: full LLM-powered, agent-direct (API-free), and mixed observe-only. - Outputs detailed observations and synthesized metadata, including Bilibili title, intro, tags, category, and cover suggestions. - Robust error handling: retries, fallback methods, and graceful degradation on failures. - Flexible architecture with parallel processing and support for custom LLM/API configuration.
Metadata
Slug video-metadata-analyzer
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Video Metadata Intelligence System?

Drop in a video file and get back a complete Bilibili publishing package: title, intro, tags, category, cover suggestion, and content declaration, all generated automatically from visual and audio analysis. It is an AI Agent Skill for Claude Code / OpenClaw, with 37 downloads so far.

How do I install Video Metadata Intelligence System?

Run "/install video-metadata-analyzer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Video Metadata Intelligence System free?

Yes, Video Metadata Intelligence System is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Video Metadata Intelligence System support?

Video Metadata Intelligence System is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Video Metadata Intelligence System?

It is built and maintained by CyberKurry (@cyberkurry); the current version is v1.0.0.

💬 Comments