← 返回 Skills 市场
reed1898

Knowledge Base Collector

作者 Reed · GitHub ↗ · v0.1.3
cross-platform ⚠ suspicious
1061
总下载
1
收藏
3
当前安装
4
版本数
在 OpenClaw 中安装
/install knowledge-base-collector
功能描述
Collect and organize a personal knowledge base from URLs (web/X/WeChat) and screenshots. Use when the user says they want to save an URL, ingest a link, archive content to KB, tag/classify notes, store screenshots, or search their saved knowledge in Telegram. Supports WeChat via a connected macOS node when cloud fetch is blocked.
使用说明 (SKILL.md)

Summary

  • Ingest: web URLs, X/Twitter links, WeChat Official Account links (mp.weixin.qq.com), and screenshots
  • Store: writes to a shared KB folder with per-item content.md + meta.json and a global index.jsonl
  • Organize: tag-first classification with richer tags (e.g. #agent, #coding-agent, #claude-code, #mcp, #rag, #prompt-injection, #security, #pricing, #database)
  • WeChat: cloud fetch may be blocked; when a macOS node (e.g. Reed-Mac) is online, prefer node-side fetch to improve success rate; otherwise create a placeholder entry
  • Search: designed to support Telegram Q&A / search flows on top of the index and content

把用户发来的链接/截图沉淀到共享知识库(KB),并做标签化整理。

默认 KB 位置

  • KB Root(可改):/home/ubuntu/.openclaw/kb
  • 索引:kb/20_Inbox/urls/index.jsonl
  • 每条内容目录:kb/20_Inbox/urls/\x3CYYYY-MM>/\x3Citem>/content.md + meta.json

目标:先入库不丢,再迭代“摘要/标签/检索”。

你要做的事(按输入类型)

1) 普通网页 / X(Twitter) / 公众号 URL 入库

运行脚本:

python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/ingest_url.py "\x3CURL>" --tags "#optional" --note "context"

行为:

  • 自动识别来源(web/x/wechat)
  • 优先用 r.jina.ai 抽取正文(无需登录)
  • 公众号遇到风控会写占位条目:status=blocked_verification + tag #needs-manual
  • 对同一 URL 做 key 去重(已存在则跳过)

WeChat 更高成功率(推荐路径)

当云端抓取命中“环境异常/验证”时:

  • 如果有已连接的 macOS 节点(例如 Reed-Mac)且该节点能访问该文章,可用 nodes.run 在节点上执行抓取(requests+bs4),然后写入 KB。
  • 注意:这条路径依赖节点在线与网络环境;无法承诺 100%。

2) 截图/图片入库(含 OCR 文本)

脚本:

python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/ingest_image.py /path/to/image.jpg \
  --text-file /path/to/ocr.txt \
  --title "..." --tags "#ai #product" --note "..."

说明:

  • ingest_image.py 负责“落盘+索引”。OCR 可用:
    • 本机 tesseract(若安装了 tesseract-ocr + chi_sim
    • 或用多模态 LLM 抽取文字后写入 --text-file

Telegram 里直接问(检索)

推荐先用脚本(本机/服务器):

python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --q "claude code" --limit 10
python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --tags "#claude-code #coding-agent" --limit 20
python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/search_kb.py --source wechat --since 7d --q "Elys"

公众号待补抓队列(占位条目)

python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/wechat_backlog.py --limit 30

周报/主题报告候选清单(给 LLM 写总结用)

python3 /home/ubuntu/.openclaw/skills/knowledge-base-collector/scripts/weekly_digest.py --days 7 --limit 30

重要注意事项(安全/隐私)

  • 截图/网页可能包含 token/验证码/密钥:入库前应做脱敏(替换为 REDACTED)。
  • 公众号抓取受风控影响:建议允许“占位入库”,后续再补全。
安全使用建议
This skill appears to implement a simple local knowledge-base writer and searcher and is mostly coherent with its description — but review these points before installing: - Third-party extractor: ingest_url.py uses https://r.jina.ai/<URL> to extract article text. That sends the target URL (and the extractor will fetch its content) to a third-party service; do not ingest URLs or articles that contain secrets or private tokens unless you accept that risk. Consider replacing r.jina.ai with a local extractor if privacy is required. - Claimed macOS node path is not implemented: SKILL.md mentions executing fetches on a connected macOS node (nodes.run) to bypass WeChat cloud blocks. The provided scripts do not implement remote node execution — instead they create placeholder entries for blocked WeChat pages. If you need automatic remote relays, the code does not provide them and the SKILL.md claim is misleading. - Local storage & permissions: by default the skill writes to /home/ubuntu/.openclaw/kb. Ensure that directory has appropriate filesystem permissions and that you don't inadvertently store screenshots or pages containing credentials, one-time codes, or other sensitive info. The code includes a reminder to redact tokens, but redaction is manual. - Network exposure: the scripts issue HTTP GETs to target URLs and to r.jina.ai via the host running the skill. If the agent runs in an environment with access to internal/intranet hosts, feeding internal URLs will cause external network requests (possible data leakage). - Review/validate: because the skill source and homepage are unknown and the package was published by an unfamiliar owner, consider running the scripts in a sandbox, inspecting KB output paths, and optionally forking/modifying the code to use a local extractor or to log fewer details before deploying to production. If these caveats are acceptable (or you modify the extractor behavior and storage path), the skill looks usable for basic KB ingestion. If you need stronger privacy guarantees, treat it as untrusted until you replace the external extractor and confirm the macOS relay behavior you expect.
功能分析
Type: OpenClaw Skill Name: knowledge-base-collector Version: 0.1.3 The skill bundle is classified as suspicious due to potential shell injection vulnerabilities and the use of powerful execution capabilities. The `SKILL.md` instructs the AI agent to execute `python3` scripts with user-provided arguments (URL, tags, notes, image paths). If the agent fails to properly sanitize or escape these arguments before constructing the shell command, it could lead to remote code execution (RCE). Additionally, the `SKILL.md` mentions using `nodes.run` to execute commands on connected macOS nodes, which is a powerful capability that could be abused for unauthorized remote execution if the agent is prompted to run arbitrary commands. While the scripts themselves appear to perform their stated function, the method of execution described in `SKILL.md` introduces significant risks.
能力评估
Purpose & Capability
The name/description match the code: scripts ingest URLs and images, write content.md/meta.json entries, index.jsonl, tag entries, and provide search/weekly digest tools. However SKILL.md claims a higher-success WeChat path that uses a connected macOS node ('nodes.run' / Reed-Mac) to fetch blocked articles; the provided scripts contain no implementation of that node-side relay or any nodes.run call. Also the SKILL.md mentions supporting Telegram Q&A flows, but there is no Telegram integration code — only CLI search output suitable to be called by an external Telegram bridge.
Instruction Scope
Instructions stay focused on ingesting URLs/images and writing to a KB on disk. They instruct network fetches (requests) and using r.jina.ai to extract text; they do not ask the agent to read unrelated system files. Caveat: SKILL.md suggests using a macOS node for blocked WeChat fetches, but the code falls back to creating placeholders; that advertised automatic remote execution is not present in the codebase.
Install Mechanism
No install spec; this is instruction + small Python scripts that run with Python + requests. Nothing is downloaded or written outside the KB folder by the code itself. Low install risk.
Credentials
The skill requests no credentials or special env vars. However it makes outbound network requests to third parties: it fetches the target URLs and uses https://r.jina.ai/<URL> as an extraction proxy. That means the target URL (and potentially its content via the proxy) is sent to a third-party service — this is proportional to fetching/extracting content but may leak sensitive URLs or article content (including tokens or screenshots if you later add image-to-LLM OCR). The default KB root (/home/ubuntu/.openclaw/kb) may contain sensitive artifacts; the skill will write files there with no extra access control.
Persistence & Privilege
Skill does not request always:true, does not modify other skills, and only writes files under a single KB tree. It can run autonomously (normal default) but has no elevated platform privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install knowledge-base-collector
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /knowledge-base-collector 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.3
chore: weekly digest + wechat backlog
v0.1.2
Tagging improvements: more stable 3-layer tag taxonomy (source/type + domain + entity) and added search_kb.py for local KB search by tags/keywords/source/time.
v0.1.1
Improve tagger: richer rule-based tags (agent/coding-agent/mcp/prompt-injection/security/engineering/etc) + language/entity tags.
v0.1.0
Initial release: ingest URLs (web/X/WeChat) + screenshots into a shared KB with tags, per-item markdown+meta, and an index. Supports WeChat node-side fetch (macOS) and placeholder entries when blocked.
元数据
Slug knowledge-base-collector
版本 0.1.3
许可证
累计安装 3
当前安装数 3
历史版本数 4
常见问题

Knowledge Base Collector 是什么?

Collect and organize a personal knowledge base from URLs (web/X/WeChat) and screenshots. Use when the user says they want to save an URL, ingest a link, archive content to KB, tag/classify notes, store screenshots, or search their saved knowledge in Telegram. Supports WeChat via a connected macOS node when cloud fetch is blocked. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1061 次。

如何安装 Knowledge Base Collector?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install knowledge-base-collector」即可一键安装,无需额外配置。

Knowledge Base Collector 是免费的吗?

是的,Knowledge Base Collector 完全免费(开源免费),可自由下载、安装和使用。

Knowledge Base Collector 支持哪些平台?

Knowledge Base Collector 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Knowledge Base Collector?

由 Reed(@reed1898)开发并维护,当前版本 v0.1.3。

💬 留言讨论