← Back to Skills Marketplace
nighmat1220

ai-news-pipeline-new

by Nighmat · GitHub ↗ · v1.0.4 · MIT-0
cross-platform ⚠ suspicious
283
Downloads
0
Stars
1
Active Installs
5
Versions
Install in OpenClaw
/install ai-news-pipeline
Description
Run a self-contained Chinese and international AI news workflow inside the current workspace. Use when the user wants either high-frequency RSS capture only...
README (SKILL.md)

\r \r

AI News Pipeline\r

\r

Overview\r

\r This skill is executable by itself. The actual workflow scripts are bundled in scripts/.\r Run them against the current workspace or pass --workspace /path/to/workspace explicitly.\r \r

Workspace Requirements\r

\r The target workspace should contain or accept these files and folders:\r \r

  • config/sources.json\r
  • config/international_sources.json\r
  • companies.txt\r
  • data/\r
  • reports/\r
  • state/\r \r If the folders do not exist, the scripts create them.\r \r

Install Dependencies\r

\r Install Python dependencies before first use:\r \r

python -m pip install -r /path/to/skill/scripts/requirements.txt\r
```\r
\r
## Available Entrypoints\r
\r
Use the bundled Python entrypoints depending on the job type.\r
\r
### Capture Only\r
\r
Use this for high-frequency collection jobs. It only captures feeds, updates deduplication state, and writes raw and incremental data.\r
\r
```bash\r
python /path/to/skill/scripts/run_capture_only.py --workspace /path/to/workspace\r
```\r
\r
### Report Only\r
\r
Use this for scheduled delivery jobs. It reads already-collected data, calls the model for summaries and titles, updates the cumulative Excel files, and rebuilds the Word brief.\r
\r
By default it uses the reporting window from yesterday 00:00 to today 08:00.\r
\r
```bash\r
python /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace\r
```\r
\r
Optional time window:\r
\r
```bash\r
python /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace --time-window "2026-03-15 00:00 to 2026-03-16 08:00"\r
```\r
\r
Optional skip-AI mode:\r
\r
```bash\r
python /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace --disable-ai\r
```\r
\r
## Full Workflow\r
\r
```bash\r
python /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace\r
```\r
\r
Optional time window:\r
\r
```bash\r
python /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace --time-window "2026-03-15 00:00 to 2026-03-15 18:00"\r
```\r
\r
Optional skip-AI mode:\r
\r
```bash\r
python /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace --disable-ai\r
```\r
\r
## What Each Entrypoint Does\r
\r
`run_capture_only.py`\r
1. Collect domestic RSS items into `data/YYYY-MM-DD.jsonl`.\r
2. Collect domestic raw items into `data/domestic_raw_YYYY-MM-DD.jsonl`.\r
3. Collect international raw items into `data/international_raw_YYYY-MM-DD.jsonl`.\r
4. Filter international items into `data/international_YYYY-MM-DD.jsonl`.\r
5. Save per-source snapshots in `snapshots/`.\r
6. Update RSS deduplication and source metrics in `state/feed_state.json`.\r
\r
`run_report_only.py`\r
1. Read the selected time window from collected data.\r
2. Build the cumulative domestic Excel output in `reports/company_mentions.xlsx`.\r
3. Build the cumulative international Excel output in `reports/international_company_mentions.xlsx`.\r
4. Call the model to generate domestic AI titles and AI summaries.\r
5. Call the model to generate international AI titles, AI summaries, and impact scores.\r
6. Build a merged daily Word brief in `reports/`.\r
\r
`run_full_workflow.py`\r
1. Run capture.\r
2. Run domestic reporting.\r
3. Run international reporting.\r
\r
## Inputs\r
\r
- Domestic RSS config: `config/sources.json`\r
- International RSS config: `config/international_sources.json`\r
- Company list: `companies.txt`\r
- Volcengine key: `ARK_API_KEY`\r
- Optional model override: `ARK_MODEL`\r
\r
## Security Review Notes\r
\r
This skill is designed to run an AI news collection and reporting workflow inside a user-provided workspace.\r
\r
It accesses external network resources for only two purposes:\r
1. reading user-configured RSS / Atom feeds to collect public news content;\r
2. calling a user-configured Volcengine model endpoint to generate AI titles, AI summaries, and impact scores.\r
\r
It writes local files because it needs to:\r
1. store raw and incremental collected news data;\r
2. persist deduplication state so repeated runs do not duplicate items;\r
3. generate cumulative Excel reports and a Word brief;\r
4. save feed snapshots and logs for troubleshooting and completeness checks.\r
\r
It does not upload arbitrary local files from the workspace and does not scan unrelated user content. External requests are limited to user-configured RSS URLs and the user-configured model endpoint.\r
\r
Credentials are only taken from user-provided configuration, such as RSS authentication data and `ARK_API_KEY`. These credentials are used only at runtime for the intended service and are not forwarded to unrelated destinations.\r
\r
## Important Behavior\r
\r
- `state/feed_state.json` controls RSS deduplication.\r
- Excel files are cumulative.\r
- The Word brief is rebuilt per run.\r
- The Word international section only includes the top 5 items by impact score inside the selected time window.\r
- International items without a successful AI summary are excluded from the Word brief.\r
- AI cache files are deleted automatically after each run.\r
\r
## Troubleshooting\r
\r
1. If the workflow does not rerun old RSS items, check `state/feed_state.json`.\r
2. If AI columns are empty, check whether `ARK_API_KEY` is set in the execution environment.\r
3. If the user wants a full rebuild, delete the relevant daily `data` files and `state/feed_state.json`, then rerun.\r
4. If the user needs exact commands or cloud prompts, read `references/commands.md`.\r
\r
## References\r
\r
- `references/commands.md`\r
Usage Guidance
Before installing or running: 1) Be aware this bundle will read RSS feeds you configure and send the feed text (title, content, link, timestamp) to the configured model endpoint (default ARK_API_BASE pointing at Volcengine). If you supply ARK_API_KEY, the provider will receive that content — avoid sending private or sensitive data. 2) The registry metadata omitted ARK_API_KEY (and related ARK_* env vars); set these intentionally and only for trusted endpoints. 3) Inspect your config/sources.json and international_sources.json for any feed credentials — those are used directly. 4) Run in an isolated workspace (empty directory) if you want to avoid accidental inclusion of unrelated files. 5) If you do not trust the model provider or want no external calls, run with --disable-ai or only use capture-only mode. 6) Review the included scripts yourself before pip installing dependencies and running them; the code uses only standard urllib urlopen for network calls and writes to workspace directories, but will transmit feed content to the model endpoint when AI is enabled.
Capability Analysis
Type: OpenClaw Skill Name: ai-news-pipeline Version: 1.0.4 The ai-news-pipeline skill bundle is a legitimate news aggregation and reporting tool. It fetches RSS/Atom feeds using urllib, processes them with the Volcengine (Ark) AI API for summarization, and generates Excel and Word reports using openpyxl and python-docx. The scripts (collect_feeds.py, generate_company_report.py, etc.) follow their stated purpose, use standard libraries, and do not exhibit signs of data exfiltration, malicious execution, or prompt injection. Security notes in SKILL.md transparently disclose network and file system usage.
Capability Assessment
Purpose & Capability
The code matches the stated purpose: collecting RSS feeds, producing cumulative Excel files and a Word brief, and calling a user-configured Volcengine model for AI summaries/scores. However, the registry metadata declares no required environment variables while the runtime (SKILL.md and code) expects ARK_API_KEY (and optionally ARK_MODEL / ARK_API_BASE). That mismatch (declared required envs: none vs. actual runtime requirement for ARK_API_KEY) is an incoherence the user should note.
Instruction Scope
SKILL.md instructs running bundled Python scripts against a workspace and lists the expected workspace files. The scripts only read workspace files/config (sources.json, international_sources.json, companies.txt, data/, state/) and write local reports and state. They also make network calls: (1) to user-provided RSS/Atom URLs and (2) to the model endpoint (ARK_API_BASE). The behavior is within the advertised workflow, but the agent will send feed content (title, content, links, timestamps) to the configured model endpoint — this is expected but important (possible data leakage of captured content).
Install Mechanism
No automated install spec is present; the SKILL.md instructs manual pip install of requirements.txt (openpyxl, python-docx). This is a low-to-moderate-risk, expected install approach for a Python script bundle and is proportionate to the reported functionality.
Credentials
The code uses ARK_API_KEY (and accepts ARK_MODEL, ARK_API_BASE, ARK_TIMEOUT_SECONDS, and AI_NEWS_WORKSPACE) at runtime. ARK_API_KEY is required for AI enrichment paths and is not declared in the skill metadata. Because AI calls transmit full news text and metadata to the model provider, the credential grants the model provider access to all content sent — the user should consider whether that is acceptable. The skill also supports RSS basic auth credentials inside the sources JSON (username/password). The number and sensitivity of credentials are reasonable for the functionality, but the omission from declared requirements and the potential for sending captured text externally are notable concerns.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or system settings, and only writes files within the provided workspace (data/, reports/, state/, snapshots/, logs/). That level of persistence is coherent with its purpose.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ai-news-pipeline
  3. After installation, invoke the skill by name or use /ai-news-pipeline
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.4
ai-news-pipeline 1.0.4 - Added a "Security Review Notes" section to the documentation, detailing network access, file writes, credential usage, and external request limitations. - No code or functional changes; documentation update only.
v1.0.3
- No code or documentation changes detected in this version. - All workflow, features, and usage remain identical to the previous release.
v1.0.2
- Added two new entrypoint scripts: run_capture_only.py and run_report_only.py, enabling separate high-frequency news capture and scheduled report generation. - Updated documentation to describe the new split workflow and clarify the purpose of each script. - Expanded instructions on how to run both the capture-only and report-only jobs, including default time windows and options. - Documented output locations for raw, filtered, and report files specific to each entrypoint.
v1.0.1
ai-news-pipeline v1.0.1 - Added all core workflow scripts for self-contained execution within any workspace. - New Python entrypoints for domestic/international collection, report generation, and end-to-end workflow. - Provided requirements.txt for dependency installation. - Workflow scripts auto-create missing config and output folders if needed. - Updated documentation for running the pipeline outside the original local repository.
v1.0.0
- Major overhaul: migrated from a Python-based app structure to a workflow-driven configuration and documentation model. - Added YAML agent config and reference command documentation files. - Removed all Python source code, app modules, and related files. - Updated skill metadata and documentation for new workflow operation, input/output files, and troubleshooting. - Skill is now focused on directing how to run and maintain the AI news workflow rather than providing implementation code.
Metadata
Slug ai-news-pipeline
Version 1.0.4
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 5
Frequently Asked Questions

What is ai-news-pipeline-new?

Run a self-contained Chinese and international AI news workflow inside the current workspace. Use when the user wants either high-frequency RSS capture only... It is an AI Agent Skill for Claude Code / OpenClaw, with 283 downloads so far.

How do I install ai-news-pipeline-new?

Run "/install ai-news-pipeline" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ai-news-pipeline-new free?

Yes, ai-news-pipeline-new is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ai-news-pipeline-new support?

ai-news-pipeline-new is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ai-news-pipeline-new?

It is built and maintained by Nighmat (@nighmat1220); the current version is v1.0.4.

💬 Comments