← Back to Skills Marketplace
News Crawler
by
Oshin123456
· GitHub ↗
· v1.0.0
· MIT-0
829
Downloads
0
Stars
7
Active Installs
1
Versions
Install in OpenClaw
/install news-crawler
Description
新闻自动爬取与总结工具。用于抓取指定网站或RSS源的新闻内容,并生成摘要报告。 使用场景: 1. 用户要求"获取今日新闻"、"爬取某网站内容" 2. 用户需要"总结新闻"、"生成日报" 3. 用户指定具体URL要求抓取内容 4. 需要监控特定新闻源的最新动态
README (SKILL.md)
News Crawler - 新闻爬虫
自动爬取新闻网站和RSS源,提取内容并生成摘要。
快速开始
1. 获取RSS新闻列表
查看可用的新闻源:
python3 scripts/rss_fetcher.py
获取指定RSS源的新闻:
python3 scripts/rss_fetcher.py \x3Crss_url> [max_items]
示例:
python3 scripts/rss_fetcher.py https://www.solidot.org/index.rss 5
2. 爬取具体网页内容
python3 scripts/crawl.py \x3Curl> [max_length]
示例:
python3 scripts/crawl.py https://example.com/news/article.html 3000
工作流程
生成新闻日报
- 选择新闻源 - 从常用源中选择或用户提供RSS地址
- 获取新闻列表 - 使用 rss_fetcher.py 获取最新文章
- 爬取详细内容 - 对每篇文章使用 crawl.py 获取全文
- 生成摘要 - 使用 LLM 总结每篇文章的核心内容
- 整理报告 - 按类别或时间排序,生成结构化日报
支持的RSS源
常用中文科技新闻源:
- Solidot: https://www.solidot.org/index.rss
- TechWeb: https://www.techweb.com.cn/rss/all.xml
- 36氪: https://36kr.com/feed
国际源:
- Hacker News: https://news.ycombinator.com/rss
- TechCrunch: https://techcrunch.com/feed/
输出格式
rss_fetcher.py 输出:
{
"items": [
{
"title": "文章标题",
"link": "文章链接",
"description": "简介",
"published": "发布时间"
}
],
"count": 10
}
crawl.py 输出:
{
"url": "原始链接",
"title": "页面标题",
"content": "正文内容",
"length": 5000
}
注意事项
- 尊重robots.txt - 爬取前检查目标网站的爬虫协议
- 控制频率 - 避免频繁请求同一网站
- 内容长度 - 默认截取5000字符,可通过参数调整
- 编码问题 - 脚本已处理UTF-8编码,特殊网站可能需要额外处理
扩展开发
如需支持更多功能,可参考:
- references/rss_sources.md - 更多RSS源列表
- 添加定时任务支持(结合 cron)
- 添加飞书/邮件推送功能
Usage Guidance
This skill appears coherent and does what it says: fetch RSS feeds or web pages and produce text the agent can summarize. Before installing or running it, consider: 1) The scripts will fetch any URL you or the agent supplies — avoid pointing it at internal services or private endpoints (risk of exposing internal data). 2) The fetched article text will be sent to your agent/LLM for summarization, so don't feed paywalled or confidential pages unless you accept that exposure. 3) Review rate limiting and robots.txt compliance for targets you crawl to avoid abuse. 4) The HTML/text extraction is basic; for complex sites you may need to review or harden parsing. If you plan to use this skill in production, run it in a sandbox, audit the code, and consider adding host whitelists, request throttling, and logging controls.
Capability Analysis
Type: OpenClaw Skill
Name: news-crawler
Version: 1.0.0
The news-crawler skill bundle is a legitimate tool for fetching and summarizing news from RSS feeds and web pages. The scripts (scripts/crawl.py and scripts/rss_fetcher.py) use standard Python libraries like urllib and xml.etree.ElementTree to perform their tasks, with no evidence of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description align with the provided artifacts: scripts/rss_fetcher.py fetches RSS feeds and scripts/crawl.py fetches and extracts page text. The SKILL.md describes using an LLM to summarize (LLM calls are expected to be performed by the agent, not the shipped scripts), which is consistent with an instruction-only skill.
Instruction Scope
Runtime instructions only tell the agent to run the included Python scripts and then use an LLM to summarize results. The scripts fetch arbitrary user-supplied URLs (no host whitelisting), so while this is expected for a crawler, it means the agent could be used to fetch internal/private endpoints or other sensitive URLs if directed to — the SKILL.md does advise respecting robots.txt and rate limits.
Install Mechanism
No install spec is provided (instruction-only plus included Python scripts). Nothing is downloaded from external URLs or added to disk by an installer, so install risk is low.
Credentials
The skill declares no required environment variables, credentials, or config paths. The code does not access secrets or environment variables, so requested privileges are proportionate.
Persistence & Privilege
always is false and the skill does not request persistent/always-on presence or modify other skills. Autonomous invocation is allowed (platform default) but not combined with other privilege concerns.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install news-crawler - After installation, invoke the skill by name or use
/news-crawler - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of "news-crawler": an automated tool for fetching and summarizing news from specified websites or RSS feeds.
- Provides scripts for retrieving news lists and detailed content: `rss_fetcher.py` for RSS sources, `crawl.py` for web pages.
- Supports structured daily news report generation, including content extraction and summarization.
- Includes usage instructions, workflow, and output formats.
- Features guidance on ethical crawling and extensibility for additional features.
Metadata
Frequently Asked Questions
What is News Crawler?
新闻自动爬取与总结工具。用于抓取指定网站或RSS源的新闻内容,并生成摘要报告。 使用场景: 1. 用户要求"获取今日新闻"、"爬取某网站内容" 2. 用户需要"总结新闻"、"生成日报" 3. 用户指定具体URL要求抓取内容 4. 需要监控特定新闻源的最新动态. It is an AI Agent Skill for Claude Code / OpenClaw, with 829 downloads so far.
How do I install News Crawler?
Run "/install news-crawler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is News Crawler free?
Yes, News Crawler is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does News Crawler support?
News Crawler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created News Crawler?
It is built and maintained by Oshin123456 (@oshin123456); the current version is v1.0.0.
More Skills