← 返回 Skills 市场
lovensky1992-wk

Content Collector

作者 lovensky1992-wk · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
440
总下载
0
收藏
1
当前安装
6
版本数
在 OpenClaw 中安装
/install content-collector
功能描述
个人内容收藏与知识管理系统。收藏、整理、检索、二创。 Use when: (1) 用户说"收藏"/"存一下"/"记录下来"/"save"/"bookmark"/"clip", (2) 用户要求搜索之前收藏的内容, (3) 用户要求基于收藏内容生成社交媒体文案(二创), (4) 用户提到"之前看过一个..."/"上...
使用说明 (SKILL.md)

Content Collector - 个人内容收藏系统

收藏好内容 → 结构化整理 → 关键词检索 → 二次创作

数据位置

  • 主存储: \x3CWORKSPACE>/collections/(articles/ tweets/ videos/ wechat/ ideas/)
  • Obsidian 同步: \x3CYOUR_OBSIDIAN_VAULT>/收藏/(每次收藏同时写入)
  • 索引: collections/index.md + collections/tags.md(自动维护)

收藏工作流

Step 0: 去重(每次必做)

有 URL 时: obsidian search query="\x3Cdomain/path>" total(去掉 https:// 前缀)或 grep -rl "\x3Curl>" collections/ 返回 > 0 → 已收藏,终止。返回 0 → 继续。

Step 1: URL 路由

按 URL 匹配处理路径。详见 references/url-routing-and-site-specs.md

URL 模式 category 处理方式
内网域名 articles Chrome Relay,不调 web_fetch
arxiv.org/abs/* articles 提取 abstract/authors
github.com/*/* articles README + stars/language
mp.weixin.qq.com wechat 优先 browser
youtube.com/watch* videos Supadata transcript
B站 videos video_transcribe.sh 本地转录
小红书/抖音(视频) videos video_transcribe.sh 本地转录
x.com/*/status/* tweets 提取互动数据,thread 展开
其他 articles 默认流程

Step 2: 内容提取

文章/网页:

  1. supadata_fetch.py web \x3Curl>(降级: web_fetch
  2. Schema.org 提取 — 详见 references/schema-extraction-spec.md
  3. 插图提取+下载(必做)— 详见 references/image-extraction-spec.md
  4. 主题关键词提取 — 详见 references/theme-extraction-spec.md

视频:

  1. 元数据: supadata_fetch.py metadata \x3Curl>bilibili_extract.py \x3Curl>
  2. 转录: bash scripts/video_transcribe.sh \x3Curl>(自动检测平台和字幕源)
  3. 精彩片段提取(≥10min) — 详见 references/highlight-extraction-spec.md
  4. 主题关键词提取

推文/短内容: 直接提取文本+互动数据

Step 3: 写文件

  1. 生成 collections/{category}/YYYY-MM-DD-slug.md(格式见下方 Schema)
  2. 内容概览图(>1000字文章) — 详见 references/content-overview-spec.md
  3. 同步到 Obsidian — 详见 references/obsidian-integration.md
  4. obsidian daily:append content="- 📌 收藏了 [[{标题}]]({source})| {一句话摘要}"
  5. 更新 index.md + tags.md

Step 3.5: 微信图片缓存(wechat 类必做)

如果 URL 是微信公众号(mp.weixin.qq.com),写完收藏文件后运行: bash scripts/cache-wechat-images.sh \x3C刚写入的收藏文件> 下载微信 CDN 图片到本地 collections/images/\x3Cslug>/,防止图片过期 404。

Step 4: 关联匹配

运行 bash scripts/post-collect.sh \x3C刚写入的收藏文件> 脚本自动匹配活跃项目和相关收藏,更新 frontmatter 的 related_projects。 如有相关收藏,在回复中附带提及。 仍需手动匹配 collections/topics/topic-pool.md → 追加到 temp/handoffs/collector-to-writing.md

写文件前自检

每次写 collections/ 文件前,确认以下步骤已完成。缺项标注 incomplete: true,不允许静默跳过。

  • 去重 ✓ → 内容提取 ✓ → 插图(文章类,必做) ✓ → 主题关键词 ✓ → 写文件 ✓ → Obsidian同步 ✓ → Daily Note ✓
  • 写 tags 前运行 bash scripts/normalize-tags.sh \x3Ctag1> \x3Ctag2> ... 检查是否有已有近似 tag,优先复用已有 tag 名称

存储 Schema

文件命名: YYYY-MM-DD-slug.md

---
title: ""
source: ""
url: ""
author: ""
date_published: ""
date_collected: ""
tags: []
category: "articles|tweets|videos|wechat|ideas"
language: "zh|en"
summary: ""
themes: []              # 5-7 个概念切面
schema_type: ""         # Schema.org @type(可选)
schema_data: {}         # ≤10 key-value(可选)
incomplete: false
# 视频专属
duration: ""
platform: ""
bvid: ""
stats: {}
subtitle_source: ""     # native_cc|whisper
highlights: []          # 精彩片段
related_projects: []
---

内容结构

  • 内容概览(Mermaid,>1000字触发)
  • 核心观点(3-7个要点)
  • 精彩片段(视频≥10min)
  • 要点摘录(blockquote 金句)
  • 热门评论精选(视频类)
  • 我的笔记
  • 原文摘要(200-500字)

英文内容

默认 storytelling 翻译风格。术语参照 \x3CWORKSPACE>/references/glossary-ai-zh.md,首次出现 中文(English) 格式。

检索

  1. 标签: tags.md
  2. 全文: grep -ril "keyword" collections/
  3. 返回匹配列表 + 摘要

二创

按选题从收藏库筛选素材,交给 xiaohongshu-opswemp-ops 处理。本 skill 只负责供料。

工具脚本

脚本 用途
scripts/supadata_fetch.py web|transcript|metadata \x3Curl> Supadata API 抓取
scripts/bilibili_extract.py \x3Curl> B站元数据
scripts/video_transcribe.sh \x3Curl> 视频转录(自动检测平台)
scripts/sync_to_obsidian.py 批量同步到 Obsidian
scripts/cache-wechat-images.sh \x3Cfile> 微信 CDN 图片本地缓存
scripts/normalize-tags.sh \x3Ctag1> \x3Ctag2> ... 标签归一化去重
scripts/post-collect.sh \x3Cfile> 收藏后自动关联分析

🔴 Final: 机械验证(不可跳过)

通知用户前运行:

bash scripts/skill-verify.sh content-collector \x3Ccollections-file-path>
# 例: bash scripts/skill-verify.sh content-collector collections/wechat/2026-04-23-xxx.md
  • ✅ ALL PASSED → 回复用户收藏结果
  • ❌ FAILED → 按输出补齐缺失项(Obsidian 同步/插图/index.md 等),重新验证直到通过

绝不在验证未通过时回复用户"已完成"。

收藏结果通知

  • 成功: 📌 已收藏:\x3C标题>\ 核心:\x3C一句话摘要>\ 标签:\x3C3-5个标签>
  • 重复: 📌 已存在:\x3C标题>(之前已收藏过)
  • 失败: ❌ 收藏失败:\x3CURL>\ 原因:\x3C失败原因>

下一步建议(条件触发)

触发条件 推荐
与公众号选题方向高度相关 用 wemp-ops 写
适合小红书短图文 用 xiaohongshu-ops 改写
某博主收藏 ≥3 条 用 x-profile-deep-dive 画像
涉及技术方案/架构决策 存到 memory 做长期参考
安全使用建议
This skill appears to implement a real content-collection workflow, but there are multiple inconsistencies you should resolve before installing or providing secrets: (1) supadata_fetch.py requires SUPADATA_API_KEY and will exit without it — despite the registry claiming no required env vars. If you do not want to provide an API key, confirm the skill has a working fallback path. (2) bilibili_extract.py accepts a cookie file or reads BILIBILI_COOKIE; video_transcribe.sh and some yt-dlp commands may try to read browser cookies — do not expose session cookies unless you trust the publisher. (3) SKILL.md calls several helper scripts (cache-wechat-images.sh, normalize-tags.sh, post-collect.sh, skill-verify.sh) that are not present in the manifest; ask the author for the missing files or an explanation. (4) Verify external binary requirements (yt-dlp, obsidian CLI, curl, Python deps) and run the skill in an isolated environment first. If you plan to provide any API keys or cookies, rotate them after testing and prefer giving only minimal, scoped credentials. If anything is unclear, request an updated package with accurate required-env metadata and the missing scripts before use.
功能分析
Type: OpenClaw Skill Name: content-collector Version: 2.0.0 The skill bundle provides a highly functional content collection system but includes several high-risk capabilities that warrant caution. Specifically, 'video_transcribe.sh' utilizes yt-dlp with the '--cookies-from-browser' flag to access sensitive browser session data for authentication on video platforms. Furthermore, 'url-routing-and-site-specs.md' defines specialized workflows for scraping internal corporate intranets (e.g., *.alibaba-inc.com) via a 'Chrome Relay' mechanism. While these features are plausibly necessary for the stated purpose of archiving restricted or internal content, the programmatic access to browser cookies and the focus on internal network data extraction represent a significant security surface that could be abused if the agent is compromised.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Overall functionality (web/video/article collection, transcription, image extraction, Obsidian sync) matches the skill name and description. Network access and local file writes are expected for this purpose. However, the registry metadata declares no required environment variables while the included code clearly expects SUPADATA_API_KEY and may consume BILIBILI_COOKIE (and other optional service credentials). That mismatch between declared requirements and actual code is an incoherence.
Instruction Scope
SKILL.md describes many concrete runtime steps (calling supadata_fetch.py, bilibili_extract.py, video_transcribe.sh, running browser evaluate JS, downloading images with curl, invoking Obsidian CLI, and running other helper scripts). Several referenced helper scripts are missing from the manifest (e.g., scripts/cache-wechat-images.sh, scripts/normalize-tags.sh, scripts/post-collect.sh, scripts/skill-verify.sh). The instructions also presuppose access to browser cookies/Chrome Relay and to an Obsidian CLI; the code uses cookies and may call external tools (yt-dlp, curl, obsidian). These actions are within the stated purpose but the missing files and implicit dependencies broaden scope and risk.
Install Mechanism
No install spec in the registry (instruction-only), which reduces supply-chain risk from downloads; however README suggests pip installs and a 'clawhub install' path that are not codified in registry metadata. The included scripts assume external binaries (yt-dlp, curl, python3) will be available, but the skill does not declare these as required binaries in the registry — an inconsistency but not necessarily malicious.
Credentials
Registry metadata lists no required env vars, but code requires SUPADATA_API_KEY (supadata_fetch.py will exit if the env var is not set). bilibili_extract.py reads a BILIBILI_COOKIE (via --cookie-file or BILIBILI_COOKIE env var) to fetch comments and may optionally use browser cookies via yt-dlp in video_transcribe.sh. Other optional envs are referenced in scripts (e.g., VOLC_ASR_APPID/TOKEN, COLLECTIONS_DIR, OBSIDIAN_DIR). Requesting API keys or cookies is consistent with functionality, but the omission of these from the declared requirements is an incoherence and increases the chance a user will unknowingly supply sensitive credentials.
Persistence & Privilege
The skill does not request always:true and does not modify other skills. It writes local files (collections/ and Obsidian vault) and invokes local tools, which matches its stated purpose. It does ask to call Obsidian CLI and browser relay — these are privileged in the sense of accessing user apps, but that is expected for an Obsidian-sync/content-extraction tool.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install content-collector
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /content-collector 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.0
Weekly sync
v1.1.1
Daily sync
v1.0.3
Daily sync
v1.0.2
Daily sync
v1.0.1
Daily sync
v1.0.0
Initial release: collect, organize, search and repurpose content from blogs, X/Twitter, websites, and Bilibili
元数据
Slug content-collector
版本 2.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 6
常见问题

Content Collector 是什么?

个人内容收藏与知识管理系统。收藏、整理、检索、二创。 Use when: (1) 用户说"收藏"/"存一下"/"记录下来"/"save"/"bookmark"/"clip", (2) 用户要求搜索之前收藏的内容, (3) 用户要求基于收藏内容生成社交媒体文案(二创), (4) 用户提到"之前看过一个..."/"上... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 440 次。

如何安装 Content Collector?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install content-collector」即可一键安装,无需额外配置。

Content Collector 是免费的吗?

是的,Content Collector 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Content Collector 支持哪些平台?

Content Collector 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Content Collector?

由 lovensky1992-wk(@lovensky1992-wk)开发并维护,当前版本 v2.0.0。

💬 留言讨论