Rss Sitemap
/install rss-sitemap
RSS Sitemap
Overview
Use this skill to bootstrap site discovery from the site's own machine-readable indexes before doing general crawling. For any task that targets a specific website, first look for sitemap, Atom, and RSS resources and use them to find the latest publications or guide the crawl.
Workflow
- Normalize the target site to an origin such as
https://example.com. - Run the bundled preprocessor through the OpenClaw
exectool when Node.js 18+ is available.execis the shell tool name; do not require a separatebashtool:node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com --output /tmp/rss-sitemap.json - Probe these root resources first when running manually:
/sitemap.xml/sitemaps.xml/atom.xml/rss.xml
- If available, also inspect
/robots.txtforSitemap:directives and include those sitemap URLs. - Fetch only resources that return a successful HTTP response and XML-like content.
- Parse XML with a real parser when possible. Avoid ad hoc regex parsing except for quick triage.
- Use discovered URLs or entries as the crawl frontier before falling back to regular page crawling.
Bundled Tool
Use scripts/preprocess-rss-sitemap.js for deterministic pre-crawl discovery. It has no npm dependencies and uses Node's built-in fetch, so it requires Node.js 18 or newer for URL fetching.
Common commands:
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --url https://example.com/sitemap.xml --url https://example.com/feed.xml
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --file ./sitemap.xml --file ./feed.xml
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com --max-depth 2 --output /tmp/rss-sitemap.json
The script outputs JSON with:
resources: probed XML or robots resources, HTTP status, content type, detected kind, and entry count.entries: normalized sitemap URLs, RSS items, or Atom entries with source provenance.
For latest-publication requests, sort entries by the best available date:
- RSS
pubDate - Atom
updated - Atom
published - Sitemap
lastmod
If entries do not include dates, prefer RSS or Atom feed order before sitemap order because feeds usually list newest content first.
If the script fails because the site blocks requests, needs JavaScript, or requires authentication, use the available web scraping/search/browser tools for fetching, then apply the same parsing and crawl strategy.
Required tools:
- OpenClaw
execenabled for host script execution. - Node.js 18+ for remote URL discovery with the bundled script.
- Any available HTTP, scraping, search, or browser tool when Node fetch cannot access the target site.
Parsing Rules
For sitemaps:
- Treat
\x3Csitemapindex>as a list of nested sitemaps; recursively fetch each\x3Cloc>. - Treat
\x3Curlset>as crawlable page URLs; extract\x3Cloc>and keep useful metadata such as\x3Clastmod>,\x3Cchangefreq>, and\x3Cpriority>when present. - De-duplicate URLs after canonicalizing obvious variants such as fragments.
For RSS feeds:
- Extract each
\x3Citem>withtitle,link,guid,pubDate, anddescriptionwhen present. - Prefer
linkas the crawl URL; fall back toguidonly if it is URL-like.
For Atom feeds:
- Extract each
\x3Centry>withtitle,id,updated,published,summary, andlink. - Prefer
\x3Clink rel="alternate" href="...">; otherwise use the first URL-likehref.
Crawl Strategy
- Prefer newest or most relevant entries when the user asks for recent content.
- For "latest publications", "recent posts", "new articles", or equivalent requests, use RSS/Atom first and return dated entries in descending order when dates are available.
- Prefer sitemap URLs when the user asks for broad site coverage.
- Keep feed and sitemap provenance with each discovered URL so later summaries can explain where a URL came from.
- If none of the well-known resources exist, state that discovery fell back to normal crawling or search.
- Respect robots, rate limits, authentication boundaries, and user instructions before expanding a crawl.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install rss-sitemap - 安装完成后,直接呼叫该 Skill 的名称或使用
/rss-sitemap触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Rss Sitemap 是什么?
Discover website URLs, feed entries, and latest publications by checking sitemap.xml, sitemaps.xml, atom.xml, and rss.xml before crawling a specific site. Us... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 45 次。
如何安装 Rss Sitemap?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install rss-sitemap」即可一键安装,无需额外配置。
Rss Sitemap 是免费的吗?
是的,Rss Sitemap 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Rss Sitemap 支持哪些平台?
Rss Sitemap 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Rss Sitemap?
由 Carlos Delfino(@carlosdelfino)开发并维护,当前版本 v1.0.0。