← Back to Skills Marketplace

Knowledge Harvester

Name: Knowledge Harvester
Author: dainash

by InspireHub.ai · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

218

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install knowledge-harvester

Description

Daily automated briefings — fetches trending content via Google News RSS, summarizes into memory for RAG retrieval

README (SKILL.md)

Knowledge Harvester

You are a knowledge curation agent run by ClawForage. Your job: fetch trending content in the user's configured domains, summarize each article, and store summaries in memory for automatic RAG indexing.

Step 1: Read Domain Configuration

cat memory/clawforage/domains.md 2>/dev/null || echo "NO_DOMAINS"

If no domains file exists (output is "NO_DOMAINS"), create a default one:

mkdir -p memory/clawforage
cp {baseDir}/templates/domains-example.md memory/clawforage/domains.md

Then inform the user they should edit memory/clawforage/domains.md with their interests and stop.

Step 2: Fetch Articles for Each Domain

Parse the domains list:

bash {baseDir}/scripts/fetch-articles.sh --list-domains memory/clawforage/domains.md

For each domain returned, fetch articles:

bash {baseDir}/scripts/fetch-articles.sh "\x3Cdomain_query>" | head -10

This outputs JSONL — one JSON object per article with title, url, date, description, source, and domain.

Step 3: Deduplicate

Pipe each domain's articles through the dedup script to filter out already-harvested content:

bash {baseDir}/scripts/fetch-articles.sh "\x3Cdomain>" | head -10 | bash {baseDir}/scripts/dedup-articles.sh memory/knowledge

Step 4: Summarize and Write

Create the output directory:

mkdir -p memory/knowledge

For each new article from the dedup output, parse its JSON fields and write a summary file.

The slug should be the title in lowercase, spaces replaced with hyphens, special chars removed, max 50 chars.

Save to memory/knowledge/{DATE}-{slug}.md using this format:

---
date: {article date, YYYY-MM-DD format}
source: {source publication}
url: {original URL}
domain: {domain from config}
harvested: {today's date}
---

# {Article Title}

{Your 100-200 word summary capturing key facts, named entities, and implications}

**Key facts:** {comma-separated key points} **Impact:** {one sentence on relevance}

Write the summary yourself based on the article's description field from the RSS feed. Capture:

Key facts and data points
Named entities (people, companies, products)
Why this matters (implications)

Step 5: Validate Output

For each file written, validate it:

bash {baseDir}/scripts/validate-knowledge.sh memory/knowledge/{filename}.md

Fix any validation errors before finishing.

Step 6: Summary

After processing all domains, output a brief summary:

How many domains processed
How many new articles harvested
How many skipped (duplicates)

Constraints

Licensed sources only: Use Google News RSS — never scrape websites directly
Summaries only: Never reproduce more than 10 consecutive words from any source
Always attribute: Every article must have source and URL in frontmatter
Rate limits: Max 100 API calls per run, max 10 articles per domain
Model: Uses your default configured model — no override needed
Privacy: Domain interests are personal — never share externally

Usage Guidance

This skill appears to do only what it claims: fetch Google News RSS for configured domains, summarize items, and store them as Markdown for later retrieval. Before installing, consider: (1) confirm how scheduling is set up — the README mentions a daily 2am run but the package contains no installer or cron setup, so you may need to schedule runs yourself; (2) review and edit memory/clawforage/domains.md to ensure you only harvest topics you want; (3) inspect the memory/knowledge directory periodically to ensure nothing sensitive ends up there; (4) run the skill manually the first time to verify behavior and that {baseDir} resolves correctly in your environment; (5) if you enable autonomous invocation, remember the agent will fetch external RSS feeds (Google News) and store summaries locally — there is no hidden network exfiltration observed, but autonomous runs increase blast radius if you later modify the skill or its environment.

Capability Analysis

Type: OpenClaw Skill Name: knowledge-harvester Version: 1.0.0 The Knowledge Harvester bundle automates news retrieval and summarization but is classified as suspicious due to high-risk behaviors and potential vulnerabilities. The instructions in SKILL.md direct the AI agent to execute shell commands using unvalidated strings (e.g., article titles and domain queries), creating a risk of command injection. Additionally, the bundle requires network access via curl in scripts/fetch-articles.sh to reach news.google.com and performs broad file system operations. While these actions are consistent with the stated purpose, the lack of input sanitization and the use of shell-based processing of external data constitute significant security risks.

Capability Assessment

ℹ Purpose & Capability

Name/description (daily news harvesting and summarization) align with required binaries (curl, jq, bash) and included scripts. One minor mismatch: README claims it 'Runs automatically every day at 2am', but there is no install spec or code here that sets up a scheduled job (cron/systemd/timer). Everything else requested is proportional to the stated purpose.

✓ Instruction Scope

SKILL.md instructs the agent to read a domain config in memory/clawforage/domains.md, fetch Google News RSS via fetch-articles.sh, deduplicate against memory/knowledge MD files, create summaries into memory/knowledge, and validate them. The scripts only read/write those memory paths and call Google News RSS; they do not reference unrelated system files, credentials, or external endpoints beyond Google News.

✓ Install Mechanism

No install spec (instruction-only skill plus shipped scripts) — lowest-risk pattern. Included scripts are plain shell and shipped in the repo; nothing is downloaded or executed from unknown remote URLs during install.

✓ Credentials

No environment variables, secrets, or external credentials are required. Requested binaries (curl, jq, bash) are appropriate for fetching/parsing RSS and manipulating JSON/text. The skill reads/writes local memory directories only, which is expected for RAG ingestion.

✓ Persistence & Privilege

always is false and the skill does not request elevated privileges or modify other skills or global agent configuration. The skill writes to its own memory paths (memory/clawforage, memory/knowledge), which is consistent with its purpose.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install knowledge-harvester
After installation, invoke the skill by name or use /knowledge-harvester
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of clawforage-knowledge-harvester for automated daily briefings. - Fetches trending articles via Google News RSS based on user-configured domains. - Summarizes each article and stores concise summaries for RAG (Retrieval-Augmented Generation) retrieval. - Deduplicates against previously harvested articles to avoid repeats. - Enforces source attribution, summary length, and privacy constraints. - Provides automatic domain configuration setup and clear output validation steps.

Metadata

Slug knowledge-harvester

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Knowledge Harvester?

Daily automated briefings — fetches trending content via Google News RSS, summarizes into memory for RAG retrieval. It is an AI Agent Skill for Claude Code / OpenClaw, with 218 downloads so far.

How do I install Knowledge Harvester?

Run "/install knowledge-harvester" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Knowledge Harvester free?

Yes, Knowledge Harvester is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Knowledge Harvester support?

Knowledge Harvester is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Knowledge Harvester?

It is built and maintained by InspireHub.ai (@dainash); the current version is v1.0.0.

More Skills

Knowledge Harvester

Knowledge Harvester

Step 1: Read Domain Configuration

Step 2: Fetch Articles for Each Domain

Step 3: Deduplicate

Step 4: Summarize and Write

Step 5: Validate Output

Step 6: Summary

Constraints

What is Knowledge Harvester?

How do I install Knowledge Harvester?

Is Knowledge Harvester free?

Which platforms does Knowledge Harvester support?

Who created Knowledge Harvester?

💬 Comments