/install follow-news
Follow News
Execution Routing Policy
Use one of the following paths according to user intent:
-
Default / on-demand digest
- When user asks for a digest, report, or latest news aggregation.
- Run
run-pipeline.pyfirst, then render with the requested template.
-
Configuration check
- When user asks about setup issues, source additions, or broken config.
- Run
validate-config.pybefore any long pipeline run.
-
Single-source fallback
- When full pipeline has an obvious source failure and user asks for partial results.
- Run only the requested source fetcher (e.g.
fetch-rss.py,fetch-web.py,fetch-github.py).
-
Health / troubleshooting mode
- When user reports errors, empty output, or stale data.
- Run: config validation -> targeted 1h smoke fetch -> pipeline with verbose logs.
Priority rule:
- Operational/config queries take precedence over full news generation.
- When a source fails, keep the run result transparent: explicit success/failure source count and failure reason list.
Automated tech news digest system with unified data source model, quality scoring pipeline, and template-based output generation.
Quick Start
-
Configuration Setup: Default configs are in
config/defaults/. Copy to workspace for customization:mkdir -p workspace/config cp config/defaults/sources.json workspace/config/follow-news-sources.json cp config/defaults/topics.json workspace/config/follow-news-topics.json -
Environment Variables:
TWITTER_API_BACKEND- Twitter backend: auto|opencli|getxapi|twitterapiio|official (optional, default: auto)OPENCLI_BIN- OpenCLI executable path override (optional)OPENCLI_MAX_WORKERS- OpenCLI concurrency limit (optional, default: 10)OPENCLI_CLOSE_TABS_AFTER_RUN- close OpenCLI-created X/Twitter tabs after fetch (optional, default: 1)OPENCLI_CLOSE_CHROME_WINDOWS_AFTER_RUN- close Chrome automation windows opened by OpenCLI on macOS (optional, default: 1)GETX_API_KEY- GetXAPI key for Twitter/X fallback (optional)TWITTERAPI_IO_KEY- twitterapi.io API key for Twitter/X fallback (optional)X_BEARER_TOKEN- Twitter/X official API bearer token for final fallback (optional)TAVILY_API_KEY- Tavily Search API key, alternative to Brave (optional)WEB_SEARCH_BACKEND- Web search backend: auto|brave|tavily|browser (optional, default: auto)BRAVE_API_KEYS- Brave Search API keys, comma-separated for rotation (optional)BRAVE_API_KEY- Single Brave key fallback (optional)GITHUB_TOKEN- GitHub personal access token (optional, improves rate limits)
OpenCLI is the preferred Twitter/X backend in
automode. In OpenClaw environments wherejackwener/opencliis installed, the agent should use that skill to validateopencli doctor, browser bridge state, and X login before asking for API keys.To use the OpenCLI backend, the user must install the OpenCLI executable and expose it on
PATH, or setOPENCLI_BINto its absolute path. OpenClaw users should also install thejackwener/opencliSkill so the agent can runopencli doctorand diagnose browser bridge or X login-state issues. OpenCLI requests default to 10 workers (OPENCLI_MAX_WORKERS=10). The fetcher closes X/Twitter tabs created during an OpenCLI run by default (OPENCLI_CLOSE_TABS_AFTER_RUN=1) and closes Chrome automation windows opened by OpenCLI on macOS (OPENCLI_CLOSE_CHROME_WINDOWS_AFTER_RUN=1) while preserving tabs and windows that existed before the run. -
Generate Digest:
# Unified pipeline (recommended) — runs all 6 sources in parallel + merge python3 scripts/run-pipeline.py \ --defaults config/defaults \ --config workspace/config \ --hours 48 --freshness pd \ --archive-dir workspace/archive/follow-news/ \ --output /tmp/td-merged.json --verbose --force -
Use Templates: Apply Discord, email, or PDF templates to merged output
Configuration Files
sources.json - Unified Data Sources
{
"sources": [
{
"id": "openai-rss",
"type": "rss",
"name": "OpenAI Blog",
"url": "https://openai.com/blog/rss.xml",
"enabled": true,
"priority": true,
"topics": ["llm", "ai-agent"],
"note": "Official OpenAI updates"
},
{
"id": "sama-twitter",
"type": "twitter",
"name": "Sam Altman",
"handle": "sama",
"enabled": true,
"priority": true,
"topics": ["llm", "frontier-tech"],
"note": "OpenAI CEO"
}
]
}
topics.json - Enhanced Topic Definitions
{
"topics": [
{
"id": "llm",
"emoji": "🧠",
"label": "LLM / Large Models",
"description": "Large Language Models, foundation models, breakthroughs",
"search": {
"queries": ["LLM latest news", "large language model breakthroughs"],
"must_include": ["LLM", "large language model", "foundation model"],
"exclude": ["tutorial", "beginner guide"]
},
"display": {
"max_items": 8,
"style": "detailed"
}
}
]
}
Scripts Pipeline
run-pipeline.py - Unified Pipeline (Recommended)
python3 scripts/run-pipeline.py \
--defaults config/defaults [--config CONFIG_DIR] \
--hours 48 --freshness pd \
--archive-dir workspace/archive/follow-news/ \
--output /tmp/td-merged.json --verbose --force
- Features: Runs all 6 fetch steps in parallel, then merges + deduplicates + scores
- Output: Final merged JSON ready for report generation (~30s total)
- Metadata: Saves per-step timing and counts to
*.meta.json - GitHub Auth: Auto-generates GitHub App token if
$GITHUB_TOKENnot set - Fallback: If this fails, run individual scripts below
Global execution constraints
- Concurrency defaults: use
OPENCLI_MAX_WORKERS=10unless explicitly overridden. - Retry policy: use exponential backoff + jitter where scripts support it; prefer shorter windows (
--hours) for smoke checks before full-window runs. - Failure behavior: mark partial completion explicitly (for example, sources succeeded/failed count and list).
- Rate-limited or flaky sources: pause and serialize before retrying.
- Output stability: keep report ordering deterministic so repeated runs produce stable section ordering.
Individual Scripts (Fallback)
fetch-rss.py - RSS Feed Fetcher
python3 scripts/fetch-rss.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--verbose]
- Parallel fetching (10 workers), retry with backoff, feedparser + regex fallback
- Timeout: 30s per feed, ETag/Last-Modified caching
fetch-twitter.py - Twitter/X KOL Monitor
python3 scripts/fetch-twitter.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--backend auto|opencli|getxapi|twitterapiio|official]
- Backend auto-detection: tries OpenCLI first, then GetXAPI, twitterapi.io, and official X API v2
- Rate limit handling, engagement metrics, retry with backoff
fetch-web.py - Web Search Engine
python3 scripts/fetch-web.py [--defaults DIR] [--config DIR] [--freshness pd] [--output FILE]
- Auto-detects Brave API rate limit: paid plans → parallel queries, free → sequential
- Without API/backend: falls back to browser-backed DuckDuckGo search for real articles
fetch-github.py - GitHub Releases Monitor
python3 scripts/fetch-github.py [--defaults DIR] [--config DIR] [--hours 168] [--output FILE]
- Parallel fetching (10 workers), 30s timeout
- Auth priority:
$GITHUB_TOKEN→ GitHub App auto-generate →ghCLI → unauthenticated (60 req/hr)
fetch-github.py --trending - GitHub Trending Repos
python3 scripts/fetch-github.py --trending [--hours 48] [--output FILE] [--verbose]
- Searches GitHub API for trending repos across 4 topics (LLM, AI Agent, Crypto, Frontier Tech)
- Quality scoring: base 5 + daily_stars_est / 10, max 15
fetch-reddit.py - Reddit Posts Fetcher
python3 scripts/fetch-reddit.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE]
- Parallel fetching (4 workers), public JSON API (no auth required)
- 13 subreddits with score filtering
enrich-articles.py - Article Full-Text Enrichment
python3 scripts/enrich-articles.py --input merged.json --output enriched.json [--min-score 10] [--max-articles 15] [--verbose]
- Fetches full article text for high-scoring articles
- Cloudflare Markdown for Agents (preferred) → HTML extraction (fallback) → Skip (paywalled/social)
- Blog domain whitelist with lower score threshold (≥3)
- Parallel fetching (5 workers, 10s timeout)
merge-sources.py - Quality Scoring & Deduplication
python3 scripts/merge-sources.py --rss FILE --twitter FILE --web FILE --github FILE --reddit FILE
- Quality scoring, title similarity dedup (85%), previous digest penalty
- Output: topic-grouped articles sorted by score
validate-config.py - Configuration Validator
python3 scripts/validate-config.py [--defaults DIR] [--config DIR] [--verbose]
- JSON schema validation, topic reference checks, duplicate ID detection
generate-pdf.py - PDF Report Generator
python3 scripts/generate-pdf.py --input report.md --output digest.pdf [--verbose]
- Converts markdown digest to styled A4 PDF with Chinese typography (Noto Sans CJK SC)
- Emoji icons, page headers/footers, blue accent theme. Requires
weasyprint.
sanitize-html.py - Safe HTML Email Converter
python3 scripts/sanitize-html.py --input report.md --output email.html [--verbose]
- Converts markdown to XSS-safe HTML email with inline CSS
- URL whitelist (http/https only), HTML-escaped text content
source-health.py - Source Health Monitor
python3 scripts/source-health.py --rss FILE --twitter FILE --github FILE --reddit FILE --web FILE [--verbose]
- Tracks per-source success/failure history over 7 days
- Reports unhealthy sources (>50% failure rate)
summarize-merged.py - Merged Data Summary
python3 scripts/summarize-merged.py --input merged.json [--top N] [--topic TOPIC]
- Human-readable summary of merged data for LLM consumption
- Shows top articles per topic with scores and metrics
User Customization
Workspace Configuration Override
Place custom configs in workspace/config/ to override defaults:
- Sources: Append new sources, disable defaults with
"enabled": false - Topics: Override topic definitions, search queries, display settings
- Merge Logic:
- Sources with same
id→ user version takes precedence - Sources with new
id→ appended to defaults - Topics with same
id→ user version completely replaces default
- Sources with same
Example Workspace Override
// workspace/config/follow-news-sources.json
{
"sources": [
{
"id": "simonwillison-rss",
"enabled": false,
"note": "Disabled: too noisy for my use case"
},
{
"id": "my-custom-blog",
"type": "rss",
"name": "My Custom Tech Blog",
"url": "https://myblog.com/rss",
"enabled": true,
"priority": true,
"topics": ["frontier-tech"]
}
]
}
User-facing output contract
- Keep outputs concise and structured for non-technical readers.
- Never expose internal implementation details (raw commands, file paths, env var names, rate-limit internals, cache state, retry counters).
- Preserve source links for every item.
- Keep sectioned numbering stable and clear so users can reference items quickly.
- In degraded mode, show scope explicitly (for example:
2/6 sources available) and avoid claiming completeness.
Templates & Output
Discord Template (references/templates/discord.md)
- Bullet list format with link suppression (
\x3Clink>) - Mobile-optimized, emoji headers
- 2000 character limit awareness
Email Template (references/templates/email.md)
- Rich metadata, technical stats, archive links
- Executive summary, top articles section
- HTML-compatible formatting
PDF Template (references/templates/pdf.md)
- A4 layout with Noto Sans CJK SC font for Chinese support
- Emoji icons, page headers/footers with page numbers
- Generated via
scripts/generate-pdf.py(requiresweasyprint)
Default Sources (151 total)
- RSS Feeds (62): AI labs, tech blogs, crypto news, Chinese tech media
- Twitter/X KOLs (48): AI researchers, crypto leaders, tech executives
- GitHub Repos (28): Major open-source projects (LangChain, vLLM, DeepSeek, Llama, etc.)
- Reddit (13): r/MachineLearning, r/LocalLLaMA, r/CryptoCurrency, r/ChatGPT, r/OpenAI, etc.
- Web Search (4 topics): LLM, AI Agent, Crypto, Frontier Tech
All sources pre-configured with appropriate topic tags and priority levels.
Dependencies
pip install -r requirements.txt
Optional but Recommended:
feedparser>=6.0.0- Better RSS parsing (fallback to regex if unavailable)jsonschema>=4.0.0- Configuration validation
All scripts work with Python 3.8+ standard library only.
Monitoring & Operations
Health Checks
# Validate configuration
python3 scripts/validate-config.py --verbose
# Test RSS feeds
python3 scripts/fetch-rss.py --hours 1 --verbose
# Check Twitter API
python3 scripts/fetch-twitter.py --hours 1 --verbose
Minimum sanity checklist (before long runs)
python3 scripts/validate-config.py --verbosepython3 scripts/fetch-rss.py --hours 1 --verbosepython3 scripts/run-pipeline.py --defaults config/defaults --hours 24 --freshness pd --archive-dir workspace/archive/follow-news/ --output /tmp/td-merged.json --verbose
If all pass, run the full windowed pipeline (--hours 48 or --hours 168) with the requested template.
Archive Management
- Digests automatically archived to
\x3Cworkspace>/archive/follow-news/ - Previous digest titles used for duplicate detection
- Old archives cleaned automatically (90+ days)
Error Handling
- Network Failures: Retry with exponential backoff
- Rate Limits: Automatic retry with appropriate delays
- Invalid Content: Graceful degradation, detailed logging
- Configuration Errors: Schema validation with helpful messages
Error playbook
validate-config.pyfails:- Return actionable schema errors and stop pipeline execution.
- Ask user to patch config first.
- Empty result from one source fetcher:
- Continue with other sources.
- Continue with a
partialstatus and surface affected source.
- Pipeline succeeds but output is missing expected sections:
- Re-run source fetch for the missing category with narrower windows (e.g.
--hours 24) and compare.
- Re-run source fetch for the missing category with narrower windows (e.g.
- Repeated 429 / timeout:
- Serialize retries, increase delay, and rerun narrowed.
- Single upstream provider failure:
- Produce best-effort digest from healthy sources and expose degraded scope in output.
API Keys & Environment
Set in ~/.zshenv or similar:
# Twitter (at least one required for Twitter source)
export TWITTERAPI_IO_KEY="your_key" # twitterapi.io key (preferred)
export X_BEARER_TOKEN="your_bearer_token" # Official X API v2 (fallback)
export TWITTER_API_BACKEND="auto" # auto|twitterapiio|official (default: auto)
# Web Search (optional, enables web search layer)
export WEB_SEARCH_BACKEND="auto" # auto|brave|tavily|browser (default: auto)
export TAVILY_API_KEY="tvly-xxx" # Tavily Search API (free 1000/mo)
# Brave Search (alternative)
export BRAVE_API_KEYS="key1,key2,key3" # Multiple keys, comma-separated rotation
export BRAVE_API_KEY="key1" # Single key fallback
export BRAVE_PLAN="free" # Override rate limit detection: free|pro
# GitHub (optional, improves rate limits)
export GITHUB_TOKEN="ghp_xxx" # PAT (simplest)
export GH_APP_ID="12345" # Or use GitHub App for auto-token
export GH_APP_INSTALL_ID="67890"
export GH_APP_KEY_FILE="/path/to/key.pem"
- Twitter: OpenCLI is preferred in
automode; API backends fallback in this order:GETX_API_KEY,TWITTERAPI_IO_KEY,X_BEARER_TOKEN - Web Search: Tavily (preferred in auto mode) or Brave; fallback to browser-backed DuckDuckGo search when API keys are missing, exhausted, or unavailable
- GitHub: Auto-generates token from GitHub App if PAT not set; unauthenticated fallback (60 req/hr)
- Reddit: No API key needed (uses public JSON API)
Cron / Scheduled Task Integration
OpenClaw Cron (Recommended)
The cron prompt should NOT hardcode the pipeline steps. Instead, reference references/digest-prompt.md and only pass configuration parameters. This ensures the pipeline logic stays in the skill repo and is consistent across all installations.
Daily Digest Cron Prompt
Read \x3CSKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a daily digest.
Replace placeholders with:
- MODE = daily
- TIME_WINDOW = past 1-2 days
- FRESHNESS = pd
- RSS_HOURS = 48
- ITEMS_PER_SECTION = 3-5
- ENRICH = true
- BLOG_PICKS_COUNT = 3
- EXTRA_SECTIONS = (none)
- SUBJECT = Daily Tech Digest - YYYY-MM-DD
- WORKSPACE = \x3Cyour workspace path>
- SKILL_DIR = \x3Cyour skill install path>
- DISCORD_CHANNEL_ID = \x3Cyour channel id>
- EMAIL = (optional)
- LANGUAGE = English
- TEMPLATE = discord
Follow every step in the prompt template strictly. Do not skip any steps.
Weekly Digest Cron Prompt
Read \x3CSKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a weekly digest.
Replace placeholders with:
- MODE = weekly
- TIME_WINDOW = past 7 days
- FRESHNESS = pw
- RSS_HOURS = 168
- ITEMS_PER_SECTION = 10-15
- ENRICH = true
- BLOG_PICKS_COUNT = 3-5
- EXTRA_SECTIONS = 📊 Weekly Trend Summary (2-3 sentences summarizing macro trends)
- SUBJECT = Weekly Tech Digest - YYYY-MM-DD
- WORKSPACE = \x3Cyour workspace path>
- SKILL_DIR = \x3Cyour skill install path>
- DISCORD_CHANNEL_ID = \x3Cyour channel id>
- EMAIL = (optional)
- LANGUAGE = English
- TEMPLATE = discord
Follow every step in the prompt template strictly. Do not skip any steps.
Why This Pattern?
- Single source of truth: Pipeline logic lives in
digest-prompt.md, not scattered across cron configs - Portable: Same skill on different OpenClaw instances, just change paths and channel IDs
- Maintainable: Update the skill → all cron jobs pick up changes automatically
- Anti-pattern: Do NOT copy pipeline steps into the cron prompt — it will drift out of sync
Multi-Channel Delivery Limitation
OpenClaw enforces cross-provider isolation: a single session can only send messages to one provider (e.g., Discord OR Telegram, not both). If you need to deliver digests to multiple platforms, create separate cron jobs for each provider:
# Job 1: Discord + Email
- DISCORD_CHANNEL_ID = \x3Cyour-discord-channel-id>
- EMAIL = [email protected]
- TEMPLATE = discord
# Job 2: Telegram DM
- DISCORD_CHANNEL_ID = (none)
- EMAIL = (none)
- TEMPLATE = telegram
Replace DISCORD_CHANNEL_ID delivery with the target platform's delivery in the second job's prompt.
This is a security feature, not a bug — it prevents accidental cross-context data leakage.
Security Notes
Execution Model
This skill uses a prompt template pattern: the agent reads digest-prompt.md and follows its instructions. This is the standard OpenClaw skill execution model — the agent interprets structured instructions from skill-provided files. All instructions are shipped with the skill bundle and can be audited before installation.
Network Access
The Python scripts make outbound requests to:
- RSS feed URLs (configured in
follow-news-sources.json) - Twitter/X API (
api.x.comorapi.twitterapi.io) - Brave Search API (
api.search.brave.com) - Tavily Search API (
api.tavily.com) - GitHub API (
api.github.com) - Reddit JSON API (
reddit.com)
No data is sent to any other endpoints. All API keys are read from environment variables declared in the skill metadata.
Shell Safety
Email delivery uses send-email.py which constructs proper MIME multipart messages with HTML body + optional PDF attachment. Subject formats are hardcoded (Daily Tech Digest - YYYY-MM-DD). PDF generation uses generate-pdf.py via weasyprint. The prompt template explicitly prohibits interpolating untrusted content (article titles, tweet text, etc.) into shell arguments. Email addresses and subjects must be static placeholder values only.
File Access
Scripts read from config/ and write to workspace/archive/. No files outside the workspace are accessed.
Support & Troubleshooting
Common Issues
- RSS feeds failing: Check network connectivity, use
--verbosefor details - Twitter rate limits: Reduce sources or increase interval
- Configuration errors: Run
validate-config.pyfor specific issues - No articles found: Check time window (
--hours) and source enablement
Debug Mode
All scripts support --verbose flag for detailed logging and troubleshooting.
Performance Tuning
- Parallel Workers: Adjust
MAX_WORKERSin scripts for your system - Timeout Settings: Increase
TIMEOUTfor slow networks - Article Limits: Adjust
MAX_ARTICLES_PER_FEEDbased on needs
Security Considerations
Shell Execution
The digest prompt instructs agents to run Python scripts via shell commands. All script paths and arguments are skill-defined constants — no user input is interpolated into commands. Two scripts use subprocess:
run-pipeline.pyorchestrates child fetch scripts (all withinscripts/directory)fetch-github.pyhas two subprocess calls:openssl dgst -sha256 -signfor JWT signing (only ifGH_APP_*env vars are set — signs a self-constructed JWT payload, no user content involved)gh auth tokenCLI fallback (only ifghis installed — reads from gh's own credential store)
No user-supplied or fetched content is ever interpolated into subprocess arguments. Email delivery uses send-email.py which builds MIME messages programmatically — no shell interpolation. PDF generation uses generate-pdf.py via weasyprint. Email subjects are static format strings only — never constructed from fetched data.
Credential & File Access
Scripts do not directly read ~/.config/, ~/.ssh/, or any credential files. All API tokens are read from environment variables declared in the skill metadata. The GitHub auth cascade is:
$GITHUB_TOKENenv var (you control what to provide)- GitHub App token generation (only if you set
GH_APP_ID,GH_APP_INSTALL_ID, andGH_APP_KEY_FILE— uses inline JWT signing viaopensslCLI, no external scripts involved) gh auth tokenCLI (delegates to gh's own secure credential store)- Unauthenticated (60 req/hr, safe fallback)
If you prefer no automatic credential discovery, simply set $GITHUB_TOKEN and the script will use it directly without attempting steps 2-3.
Dependency Installation
This skill does not install any packages. requirements.txt lists optional dependencies (feedparser, jsonschema) for reference only. All scripts work with Python 3.8+ standard library. Users should install optional deps in a virtualenv if desired — the skill never runs pip install.
Input Sanitization
- URL resolution rejects non-HTTP(S) schemes (javascript:, data:, etc.)
- RSS fallback parsing uses simple, non-backtracking regex patterns (no ReDoS risk)
- All fetched content is treated as untrusted data for display only
Network Access
Scripts make outbound HTTP requests to configured RSS feeds, Twitter API, GitHub API, Reddit JSON API, Brave Search API, and Tavily Search API. No inbound connections or listeners are created.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install follow-news - 安装完成后,直接呼叫该 Skill 的名称或使用
/follow-news触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Follow News 是什么?
Generates tech news digests from six sources using a unified model with quality scoring, deduplication, and multi-format output including Discord, email, and... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 86 次。
如何安装 Follow News?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install follow-news」即可一键安装,无需额外配置。
Follow News 是免费的吗?
是的,Follow News 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Follow News 支持哪些平台?
Follow News 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Follow News?
由 tangwz(@tangwz)开发并维护,当前版本 v0.1.0。