← 返回 Skills 市场
dainash

Knowledge Harvester

作者 InspireHub.ai · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
218
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install knowledge-harvester
功能描述
Daily automated briefings — fetches trending content via Google News RSS, summarizes into memory for RAG retrieval
使用说明 (SKILL.md)

Knowledge Harvester

You are a knowledge curation agent run by ClawForage. Your job: fetch trending content in the user's configured domains, summarize each article, and store summaries in memory for automatic RAG indexing.

Step 1: Read Domain Configuration

cat memory/clawforage/domains.md 2>/dev/null || echo "NO_DOMAINS"

If no domains file exists (output is "NO_DOMAINS"), create a default one:

mkdir -p memory/clawforage
cp {baseDir}/templates/domains-example.md memory/clawforage/domains.md

Then inform the user they should edit memory/clawforage/domains.md with their interests and stop.

Step 2: Fetch Articles for Each Domain

Parse the domains list:

bash {baseDir}/scripts/fetch-articles.sh --list-domains memory/clawforage/domains.md

For each domain returned, fetch articles:

bash {baseDir}/scripts/fetch-articles.sh "\x3Cdomain_query>" | head -10

This outputs JSONL — one JSON object per article with title, url, date, description, source, and domain.

Step 3: Deduplicate

Pipe each domain's articles through the dedup script to filter out already-harvested content:

bash {baseDir}/scripts/fetch-articles.sh "\x3Cdomain>" | head -10 | bash {baseDir}/scripts/dedup-articles.sh memory/knowledge

Step 4: Summarize and Write

Create the output directory:

mkdir -p memory/knowledge

For each new article from the dedup output, parse its JSON fields and write a summary file.

The slug should be the title in lowercase, spaces replaced with hyphens, special chars removed, max 50 chars.

Save to memory/knowledge/{DATE}-{slug}.md using this format:

---
date: {article date, YYYY-MM-DD format}
source: {source publication}
url: {original URL}
domain: {domain from config}
harvested: {today's date}
---

# {Article Title}

{Your 100-200 word summary capturing key facts, named entities, and implications}

**Key facts:** {comma-separated key points} **Impact:** {one sentence on relevance}

Write the summary yourself based on the article's description field from the RSS feed. Capture:

  • Key facts and data points
  • Named entities (people, companies, products)
  • Why this matters (implications)

Step 5: Validate Output

For each file written, validate it:

bash {baseDir}/scripts/validate-knowledge.sh memory/knowledge/{filename}.md

Fix any validation errors before finishing.

Step 6: Summary

After processing all domains, output a brief summary:

  • How many domains processed
  • How many new articles harvested
  • How many skipped (duplicates)

Constraints

  • Licensed sources only: Use Google News RSS — never scrape websites directly
  • Summaries only: Never reproduce more than 10 consecutive words from any source
  • Always attribute: Every article must have source and URL in frontmatter
  • Rate limits: Max 100 API calls per run, max 10 articles per domain
  • Model: Uses your default configured model — no override needed
  • Privacy: Domain interests are personal — never share externally
安全使用建议
This skill appears to do only what it claims: fetch Google News RSS for configured domains, summarize items, and store them as Markdown for later retrieval. Before installing, consider: (1) confirm how scheduling is set up — the README mentions a daily 2am run but the package contains no installer or cron setup, so you may need to schedule runs yourself; (2) review and edit memory/clawforage/domains.md to ensure you only harvest topics you want; (3) inspect the memory/knowledge directory periodically to ensure nothing sensitive ends up there; (4) run the skill manually the first time to verify behavior and that {baseDir} resolves correctly in your environment; (5) if you enable autonomous invocation, remember the agent will fetch external RSS feeds (Google News) and store summaries locally — there is no hidden network exfiltration observed, but autonomous runs increase blast radius if you later modify the skill or its environment.
功能分析
Type: OpenClaw Skill Name: knowledge-harvester Version: 1.0.0 The Knowledge Harvester bundle automates news retrieval and summarization but is classified as suspicious due to high-risk behaviors and potential vulnerabilities. The instructions in SKILL.md direct the AI agent to execute shell commands using unvalidated strings (e.g., article titles and domain queries), creating a risk of command injection. Additionally, the bundle requires network access via curl in scripts/fetch-articles.sh to reach news.google.com and performs broad file system operations. While these actions are consistent with the stated purpose, the lack of input sanitization and the use of shell-based processing of external data constitute significant security risks.
能力评估
Purpose & Capability
Name/description (daily news harvesting and summarization) align with required binaries (curl, jq, bash) and included scripts. One minor mismatch: README claims it 'Runs automatically every day at 2am', but there is no install spec or code here that sets up a scheduled job (cron/systemd/timer). Everything else requested is proportional to the stated purpose.
Instruction Scope
SKILL.md instructs the agent to read a domain config in memory/clawforage/domains.md, fetch Google News RSS via fetch-articles.sh, deduplicate against memory/knowledge MD files, create summaries into memory/knowledge, and validate them. The scripts only read/write those memory paths and call Google News RSS; they do not reference unrelated system files, credentials, or external endpoints beyond Google News.
Install Mechanism
No install spec (instruction-only skill plus shipped scripts) — lowest-risk pattern. Included scripts are plain shell and shipped in the repo; nothing is downloaded or executed from unknown remote URLs during install.
Credentials
No environment variables, secrets, or external credentials are required. Requested binaries (curl, jq, bash) are appropriate for fetching/parsing RSS and manipulating JSON/text. The skill reads/writes local memory directories only, which is expected for RAG ingestion.
Persistence & Privilege
always is false and the skill does not request elevated privileges or modify other skills or global agent configuration. The skill writes to its own memory paths (memory/clawforage, memory/knowledge), which is consistent with its purpose.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install knowledge-harvester
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /knowledge-harvester 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of clawforage-knowledge-harvester for automated daily briefings. - Fetches trending articles via Google News RSS based on user-configured domains. - Summarizes each article and stores concise summaries for RAG (Retrieval-Augmented Generation) retrieval. - Deduplicates against previously harvested articles to avoid repeats. - Enforces source attribution, summary length, and privacy constraints. - Provides automatic domain configuration setup and clear output validation steps.
元数据
Slug knowledge-harvester
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Knowledge Harvester 是什么?

Daily automated briefings — fetches trending content via Google News RSS, summarizes into memory for RAG retrieval. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 218 次。

如何安装 Knowledge Harvester?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install knowledge-harvester」即可一键安装,无需额外配置。

Knowledge Harvester 是免费的吗?

是的,Knowledge Harvester 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Knowledge Harvester 支持哪些平台?

Knowledge Harvester 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Knowledge Harvester?

由 InspireHub.ai(@dainash)开发并维护,当前版本 v1.0.0。

💬 留言讨论