← 返回 Skills 市场

Knowledge Harvester

Name: Knowledge Harvester
Author: dainash

作者 InspireHub.ai · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

218

总下载

当前安装

版本数

在 OpenClaw 中安装

/install knowledge-harvester

功能描述

Daily automated briefings — fetches trending content via Google News RSS, summarizes into memory for RAG retrieval

使用说明 (SKILL.md)

Knowledge Harvester

You are a knowledge curation agent run by ClawForage. Your job: fetch trending content in the user's configured domains, summarize each article, and store summaries in memory for automatic RAG indexing.

Step 1: Read Domain Configuration

cat memory/clawforage/domains.md 2>/dev/null || echo "NO_DOMAINS"

If no domains file exists (output is "NO_DOMAINS"), create a default one:

mkdir -p memory/clawforage
cp {baseDir}/templates/domains-example.md memory/clawforage/domains.md

Then inform the user they should edit memory/clawforage/domains.md with their interests and stop.

Step 2: Fetch Articles for Each Domain

Parse the domains list:

bash {baseDir}/scripts/fetch-articles.sh --list-domains memory/clawforage/domains.md

For each domain returned, fetch articles:

bash {baseDir}/scripts/fetch-articles.sh "\x3Cdomain_query>" | head -10

This outputs JSONL — one JSON object per article with title, url, date, description, source, and domain.

Step 3: Deduplicate

Pipe each domain's articles through the dedup script to filter out already-harvested content:

bash {baseDir}/scripts/fetch-articles.sh "\x3Cdomain>" | head -10 | bash {baseDir}/scripts/dedup-articles.sh memory/knowledge

Step 4: Summarize and Write

Create the output directory:

mkdir -p memory/knowledge

For each new article from the dedup output, parse its JSON fields and write a summary file.

The slug should be the title in lowercase, spaces replaced with hyphens, special chars removed, max 50 chars.

Save to memory/knowledge/{DATE}-{slug}.md using this format:

---
date: {article date, YYYY-MM-DD format}
source: {source publication}
url: {original URL}
domain: {domain from config}
harvested: {today's date}
---

# {Article Title}

{Your 100-200 word summary capturing key facts, named entities, and implications}

**Key facts:** {comma-separated key points} **Impact:** {one sentence on relevance}

Write the summary yourself based on the article's description field from the RSS feed. Capture:

Key facts and data points
Named entities (people, companies, products)
Why this matters (implications)

Step 5: Validate Output

For each file written, validate it:

bash {baseDir}/scripts/validate-knowledge.sh memory/knowledge/{filename}.md

Fix any validation errors before finishing.

Step 6: Summary

After processing all domains, output a brief summary:

How many domains processed
How many new articles harvested
How many skipped (duplicates)

Constraints

Licensed sources only: Use Google News RSS — never scrape websites directly
Summaries only: Never reproduce more than 10 consecutive words from any source
Always attribute: Every article must have source and URL in frontmatter
Rate limits: Max 100 API calls per run, max 10 articles per domain
Model: Uses your default configured model — no override needed
Privacy: Domain interests are personal — never share externally

安全使用建议

This skill appears to do only what it claims: fetch Google News RSS for configured domains, summarize items, and store them as Markdown for later retrieval. Before installing, consider: (1) confirm how scheduling is set up — the README mentions a daily 2am run but the package contains no installer or cron setup, so you may need to schedule runs yourself; (2) review and edit memory/clawforage/domains.md to ensure you only harvest topics you want; (3) inspect the memory/knowledge directory periodically to ensure nothing sensitive ends up there; (4) run the skill manually the first time to verify behavior and that {baseDir} resolves correctly in your environment; (5) if you enable autonomous invocation, remember the agent will fetch external RSS feeds (Google News) and store summaries locally — there is no hidden network exfiltration observed, but autonomous runs increase blast radius if you later modify the skill or its environment.

功能分析

Type: OpenClaw Skill Name: knowledge-harvester Version: 1.0.0 The Knowledge Harvester bundle automates news retrieval and summarization but is classified as suspicious due to high-risk behaviors and potential vulnerabilities. The instructions in SKILL.md direct the AI agent to execute shell commands using unvalidated strings (e.g., article titles and domain queries), creating a risk of command injection. Additionally, the bundle requires network access via curl in scripts/fetch-articles.sh to reach news.google.com and performs broad file system operations. While these actions are consistent with the stated purpose, the lack of input sanitization and the use of shell-based processing of external data constitute significant security risks.

能力评估

ℹ Purpose & Capability

Name/description (daily news harvesting and summarization) align with required binaries (curl, jq, bash) and included scripts. One minor mismatch: README claims it 'Runs automatically every day at 2am', but there is no install spec or code here that sets up a scheduled job (cron/systemd/timer). Everything else requested is proportional to the stated purpose.

✓ Instruction Scope

SKILL.md instructs the agent to read a domain config in memory/clawforage/domains.md, fetch Google News RSS via fetch-articles.sh, deduplicate against memory/knowledge MD files, create summaries into memory/knowledge, and validate them. The scripts only read/write those memory paths and call Google News RSS; they do not reference unrelated system files, credentials, or external endpoints beyond Google News.

✓ Install Mechanism

No install spec (instruction-only skill plus shipped scripts) — lowest-risk pattern. Included scripts are plain shell and shipped in the repo; nothing is downloaded or executed from unknown remote URLs during install.

✓ Credentials

No environment variables, secrets, or external credentials are required. Requested binaries (curl, jq, bash) are appropriate for fetching/parsing RSS and manipulating JSON/text. The skill reads/writes local memory directories only, which is expected for RAG ingestion.

✓ Persistence & Privilege

always is false and the skill does not request elevated privileges or modify other skills or global agent configuration. The skill writes to its own memory paths (memory/clawforage, memory/knowledge), which is consistent with its purpose.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install knowledge-harvester
安装完成后，直接呼叫该 Skill 的名称或使用 /knowledge-harvester 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of clawforage-knowledge-harvester for automated daily briefings. - Fetches trending articles via Google News RSS based on user-configured domains. - Summarizes each article and stores concise summaries for RAG (Retrieval-Augmented Generation) retrieval. - Deduplicates against previously harvested articles to avoid repeats. - Enforces source attribution, summary length, and privacy constraints. - Provides automatic domain configuration setup and clear output validation steps.

元数据

Slug knowledge-harvester

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题