← Back to Skills Marketplace
arbiger

Kb Collector

by arbiger · GitHub ↗ · v1.2.1
cross-platform ⚠ suspicious
455
Downloads
0
Stars
1
Active Installs
4
Versions
Install in OpenClaw
/install kb-collector
Description
Knowledge Base Collector - save YouTube, URLs, text to Obsidian with AI summarization. Auto-transcribes videos, fetches pages, supports weekly/monthly digest...
README (SKILL.md)

KB Collector

Knowledge Base Collector - Save YouTube, URLs, and text to Obsidian with automatic transcription and summarization.

Features

  • YouTube Collection - Download audio, transcribe with Whisper, auto-summarize
  • URL Collection - Fetch and summarize web pages
  • Plain Text - Direct save with tags
  • Digest - Weekly/Monthly/Yearly review emails
  • Nightly Research - Automated AI/LLM/tech trend tracking

Installation

# Install dependencies
pip install yt-dlp faster-whisper requests beautifulsoup4

# For AI summarization (optional)
pip install openai anthropic

Usage (Python Version - Recommended)

# Collect YouTube video
python3 scripts/collect.py youtube "https://youtu.be/xxxxx" "stock,investing"

# Collect URL
python3 scripts/collect.py url "https://example.com/article" "python,api"

# Collect plain text
python3 scripts/collect.py text "My note content" "tag1,tag2"

Usage (Bash Version - Legacy)

# Collect YouTube
./scripts/collect.sh "https://youtu.be/xxxxx" "stock,investing" youtube

# Collect URL
./scripts/collect.sh "https://example.com/article" "python,api" url

# Collect plain text
./scripts/collect.sh "My note" "tag1,tag2" text

Nightly Research (New!)

Automated AI/LLM/tech trend tracking - runs daily and saves to Obsidian.

# Save to Obsidian only
./scripts/nightly-research.sh --save

# Save to Obsidian AND send email
./scripts/nightly-research.sh --save --send

# Send email only
./scripts/nightly-research.sh --send

Features

  • Searches multiple sources (Hacker News, Reddit, Twitter)
  • LLM summarization (optional)
  • Saves to Obsidian with tags
  • Optional email digest

Cron Setup (optional)

# Run every night at 10 PM
0 22 * * * /path/to/nightly-research.sh --save --send

Configuration

Edit the script to customize:

VAULT_PATH = os.path.expanduser("~/Documents/YourVault")
NOTE_AUTHOR = "YourName"

Output Format

Notes saved to: {VAULT_PATH}/yyyy-mm-dd-title.md

---
created: 2026-03-03T12:00:00
source: https://...
tags: [stock, investing]
author: George
---

# Title

> **TLDR:** Summary here...

---

Content...

---
*Saved: 2026-03-03*

Dependencies

  • yt-dlp
  • faster-whisper (for transcription)
  • requests + beautifulsoup4 (for URL fetching)
  • Optional: openai/anthropic (for AI summarization)

Credits

Automated note-taking workflow for Obsidian.

Usage Guidance
What to check before installing/using this skill: - Expectation mismatch: The registry/metadata declare no env vars but the scripts read TAVILY_API_KEY, OBSIDIAN_VAULT, and RECIPIENT. Confirm whether you should provide any API keys and where those keys will be used. - Vault path & recipient: The Python and shell scripts default to a specific user's vault path (/Users/george/... or ~/Documents/Georges/Knowledge). Edit VAULT_PATH/VAULT/OBSIDIAN_VAULT to point to your own vault before running, and replace the hard-coded RECIPIENT ([email protected]) with your address or remove email sending if undesired. - External network and email: nightly-research.sh contacts https://api.tavily.com and digest/nightly can send mail via 'gog gmail send'. If you enable those features, you will send search queries and possibly note content to external services. Only set TAVILY_API_KEY if you trust Tavily and understand their data use. Inspect/verify how 'gog' is configured for Gmail on your machine — it may reuse stored credentials to send mail. - Data exfil channels: The main exfil vectors here are (1) posting queries/results to Tavily, and (2) sending digests via the 'gog' tool. There is no obfuscated code or hidden endpoints, but these channels can leak note contents if misconfigured. - Run in a safe environment first: Execute scripts in a sandbox or a test account/vault, with no sensitive notes present. Replace hard-coded values, and run with network disabled if you want only local behavior. - Dependency hygiene: The SKILL.md asks you to pip install yt-dlp, faster-whisper, etc. Those packages and the external binaries (yt-dlp, whisper, gog) will run code on your machine. Install them from official sources and review their own security considerations. - Ask the author / request metadata: The skill lacks homepage/author contact and doesn't declare env vars. If you plan to use it long-term, ask the publisher to add explicit docs for required credentials and configurable defaults (vault path, recipient), or update the skill to avoid hard-coded user-specific paths and recipients. If you are uncomfortable with network calls or automatic email sending, either remove/disable those parts of the scripts or decline to install. If you proceed, make the environment variables explicit and verify behavior with small, non-sensitive test data first.
Capability Analysis
Type: OpenClaw Skill Name: kb-collector Version: 1.2.1 The skill bundle contains hardcoded configurations that pose a significant privacy and data exfiltration risk, most notably a fixed email recipient ('[email protected]') in 'scripts/digest.sh' that would send note summaries to a third party by default. Additionally, 'scripts/collect.sh' and 'scripts/collect.py' utilize hardcoded local file paths and include unused or undefined dependencies such as 'yfinance' and 'web_fetch'. While these appear to be artifacts of a personal workflow shared without proper sanitization, the hardcoded transmission of user data to an external domain is a high-risk behavior.
Capability Assessment
Purpose & Capability
Name/description align with the included scripts: downloading YouTube audio, transcribing, fetching pages, saving to an Obsidian vault, and generating digests/nightly research. Some minor mismatches: SKILL.md claims it 'searches multiple sources (Hacker News, Reddit, Twitter)', but nightly-research.sh performs searches via the Tavily API only (it does not independently query those sites). Overall capabilities match the stated purpose.
Instruction Scope
SKILL.md and scripts instruct the agent to fetch remote web pages and call external services (Tavily API via curl, and send email via the 'gog gmail send' tool). The scripts also write files into an Obsidian vault path and remove temporary audio files. The SKILL metadata declared no required env vars, but the scripts read/expect environment variables (TAVILY_API_KEY, OBSIDIAN_VAULT, RECIPIENT) and use a hard-coded email recipient and hard-coded vault paths (/Users/george/... and ~/Documents/Georges/Knowledge). These runtime actions (external API calls and email sending) are outside the declared requirements and should be explicitly disclosed.
Install Mechanism
There is no formal install spec in the registry (instruction-only), which is lower risk from an automatic installer perspective. The SKILL.md tells the user to pip install packages (yt-dlp, faster-whisper, requests, beautifulsoup4, optional openai/anthropic). That is consistent with the code. No downloads from arbitrary URLs or archive extraction are present. Because the code relies on external binaries (yt-dlp, whisper) and a third-party CLI tool 'gog', the user must install those manually — the absence of an install spec means the skill won't auto-install them but the runtime will fail or behave unexpectedly if they are missing.
Credentials
Registry lists no required environment variables or credentials, yet scripts make use of environment vars: TAVILY_API_KEY (sent to api.tavily.com), OBSIDIAN_VAULT (overrides VAULT), and RECIPIENT. digest.sh and nightly-research.sh also use or assume the presence of an email-sending tool ('gog') which uses credentials not declared here. The scripts also include hard-coded local paths and a hard-coded recipient email ([email protected]). Asking for or using API keys and email-sending capabilities is proportionate to the feature set — but they should be declared, and the hard-coded recipient is suspicious/unexpected behavior that could lead to unintended data exfiltration.
Persistence & Privilege
The skill does not request permanent platform-level privileges (always: false) and does not modify other skills' configuration. It writes notes to a user-visible Obsidian vault and temporary files in /tmp, which is expected given its purpose. Autonomous invocation is allowed (default) — combined with the environment/credential concerns above this increases potential impact, but the skill alone does not request 'always' or system-wide config changes.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install kb-collector
  3. After installation, invoke the skill by name or use /kb-collector
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.2.1
- Minor update to nightly-research.sh script. - No user-facing changes to documentation or features. - Maintains all existing functionalities for nightly research automation.
v1.2.0
- Added nightly research automation script for AI/LLM/tech trend tracking, with options to save notes to Obsidian and send email digests. - Updated SKILL.md with documentation for the new nightly research feature, including usage instructions and cron setup. - Added package-lock.json for improved dependency management. - Enhanced digest script and overall skill documentation for clarity and new functionality.
v1.1.0
**Python-based collector script added, improved documentation and options.** - Added Python script (scripts/collect.py) for collecting YouTube, URLs, and text to Obsidian - SKILL.md updated: clearer instructions, Python usage recommended, improved formatting, new install/usage/configuration sections - Dependency list expanded (yt-dlp, faster-whisper, requests, beautifulsoup4; openai/anthropic optional) - Bash script instructions moved to a "legacy" section - Digest and summarization features mentioned but unchanged in function - Output note format documentation updated
v1.0.0
Version 1.0.0 of kb-collector - Initial release of Knowledge Base Collector. - Save articles, plain text, and YouTube videos to Obsidian vaults. - Auto-transcribe YouTube videos using Whisper and summarize content. - Fetch and summarize web page URLs; auto-tag content. - Supports digest emails (weekly, monthly, yearly) sent via Gmail. - Simple trigger commands for collecting and reviewing content.
Metadata
Slug kb-collector
Version 1.2.1
License
All-time Installs 2
Active Installs 1
Total Versions 4
Frequently Asked Questions

What is Kb Collector?

Knowledge Base Collector - save YouTube, URLs, text to Obsidian with AI summarization. Auto-transcribes videos, fetches pages, supports weekly/monthly digest... It is an AI Agent Skill for Claude Code / OpenClaw, with 455 downloads so far.

How do I install Kb Collector?

Run "/install kb-collector" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Kb Collector free?

Yes, Kb Collector is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Kb Collector support?

Kb Collector is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Kb Collector?

It is built and maintained by arbiger (@arbiger); the current version is v1.2.1.

💬 Comments