Rss Sitemap
/install rss-sitemap
RSS Sitemap
Overview
Use this skill to bootstrap site discovery from the site's own machine-readable indexes before doing general crawling. For any task that targets a specific website, first look for sitemap, Atom, and RSS resources and use them to find the latest publications or guide the crawl.
Workflow
- Normalize the target site to an origin such as
https://example.com. - Run the bundled preprocessor through the OpenClaw
exectool when Node.js 18+ is available.execis the shell tool name; do not require a separatebashtool:node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com --output /tmp/rss-sitemap.json - Probe these root resources first when running manually:
/sitemap.xml/sitemaps.xml/atom.xml/rss.xml
- If available, also inspect
/robots.txtforSitemap:directives and include those sitemap URLs. - Fetch only resources that return a successful HTTP response and XML-like content.
- Parse XML with a real parser when possible. Avoid ad hoc regex parsing except for quick triage.
- Use discovered URLs or entries as the crawl frontier before falling back to regular page crawling.
Bundled Tool
Use scripts/preprocess-rss-sitemap.js for deterministic pre-crawl discovery. It has no npm dependencies and uses Node's built-in fetch, so it requires Node.js 18 or newer for URL fetching.
Common commands:
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --url https://example.com/sitemap.xml --url https://example.com/feed.xml
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --file ./sitemap.xml --file ./feed.xml
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com --max-depth 2 --output /tmp/rss-sitemap.json
The script outputs JSON with:
resources: probed XML or robots resources, HTTP status, content type, detected kind, and entry count.entries: normalized sitemap URLs, RSS items, or Atom entries with source provenance.
For latest-publication requests, sort entries by the best available date:
- RSS
pubDate - Atom
updated - Atom
published - Sitemap
lastmod
If entries do not include dates, prefer RSS or Atom feed order before sitemap order because feeds usually list newest content first.
If the script fails because the site blocks requests, needs JavaScript, or requires authentication, use the available web scraping/search/browser tools for fetching, then apply the same parsing and crawl strategy.
Required tools:
- OpenClaw
execenabled for host script execution. - Node.js 18+ for remote URL discovery with the bundled script.
- Any available HTTP, scraping, search, or browser tool when Node fetch cannot access the target site.
Parsing Rules
For sitemaps:
- Treat
\x3Csitemapindex>as a list of nested sitemaps; recursively fetch each\x3Cloc>. - Treat
\x3Curlset>as crawlable page URLs; extract\x3Cloc>and keep useful metadata such as\x3Clastmod>,\x3Cchangefreq>, and\x3Cpriority>when present. - De-duplicate URLs after canonicalizing obvious variants such as fragments.
For RSS feeds:
- Extract each
\x3Citem>withtitle,link,guid,pubDate, anddescriptionwhen present. - Prefer
linkas the crawl URL; fall back toguidonly if it is URL-like.
For Atom feeds:
- Extract each
\x3Centry>withtitle,id,updated,published,summary, andlink. - Prefer
\x3Clink rel="alternate" href="...">; otherwise use the first URL-likehref.
Crawl Strategy
- Prefer newest or most relevant entries when the user asks for recent content.
- For "latest publications", "recent posts", "new articles", or equivalent requests, use RSS/Atom first and return dated entries in descending order when dates are available.
- Prefer sitemap URLs when the user asks for broad site coverage.
- Keep feed and sitemap provenance with each discovered URL so later summaries can explain where a URL came from.
- If none of the well-known resources exist, state that discovery fell back to normal crawling or search.
- Respect robots, rate limits, authentication boundaries, and user instructions before expanding a crawl.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install rss-sitemap - After installation, invoke the skill by name or use
/rss-sitemap - Provide required inputs per the skill's parameter spec and get structured output
What is Rss Sitemap?
Discover website URLs, feed entries, and latest publications by checking sitemap.xml, sitemaps.xml, atom.xml, and rss.xml before crawling a specific site. Us... It is an AI Agent Skill for Claude Code / OpenClaw, with 45 downloads so far.
How do I install Rss Sitemap?
Run "/install rss-sitemap" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Rss Sitemap free?
Yes, Rss Sitemap is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Rss Sitemap support?
Rss Sitemap is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Rss Sitemap?
It is built and maintained by Carlos Delfino (@carlosdelfino); the current version is v1.0.0.