← Back to Skills Marketplace
ajayhao

Article Fetcher(文章抓取+Notion存档)

by haozhenjie · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
19
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install article-fetcher
Description
抓取微信公众号、小红书、豆瓣、知乎文章,自动上传 OSS 图片,LLM 智能提取关键词,一键存档到 Notion
README (SKILL.md)

\r \r

Article Fetcher v1.0.1\r

\r 抓取微信公众号、小红书、豆瓣、知乎文章,自动上传 OSS 图床,LLM 智能关键词提取,一键存档到 Notion。\r \r

快速开始\r

\r

1. 安装依赖\r

\r

pip install -r requirements.txt\r
```\r
\r
### 2. 配置环境变量(`~/.openclaw/.env`)\r
\r
```bash\r
# 必需:OSS 图床\r
ALIYUN_OSS_AK=your_ak\r
ALIYUN_OSS_SK=your_sk\r
ALIYUN_OSS_BUCKET_ID=your_bucket\r
ALIYUN_OSS_ENDPOINT=oss-cn-shanghai.aliyuncs.com\r
\r
# 必需:Notion 存档\r
NOTION_API_KEY=secret_xxx\r
NOTION_ARTICLE_DATABASE_ID=database_id\r
\r
# 可选:LLM 关键词提取(DashScope)\r
DASHSCOPE_API_KEY=sk-xxx\r
DASHSCOPE_MODEL=qwen3.5-plus\r
\r
# 可选:Cookies(反爬,Netscape 格式)\r
WECHAT_COOKIES_FILE=~/.cookies/wechat_cookies.txt\r
ZHIHU_COOKIES_FILE=~/.cookies/zhihu_cookies.txt\r
```\r
\r
### 3. 使用\r
\r
```bash\r
cd \x3Cskill-dir>\r
python3 main.py "文章 URL" [标签1] [标签2]\r
```\r
\r
**支持平台**:微信公众号 (`mp.weixin.qq.com`)、小红书 (`xiaohongshu.com` / `xhslink.com`)、豆瓣 (`douban.com`)、知乎 (`zhihu.com`)\r
\r
## 处理流程\r
\r
```\r
URL → 平台识别 → 内容抓取 → 图片上传 OSS → 关键词提取 (LLM → 词频降级) → Notion 存档\r
```\r
\r
## Notion 数据库字段\r
\r
| 字段 | 类型 | 说明 |\r
|------|------|------|\r
| Title | title | 文章标题(≤200 字符) |\r
| Source | rich_text | 来源平台 |\r
| Author | rich_text | 作者 |\r
| Link | url | 原文链接 |\r
| Tags | multi_select | 自动提取关键词 + 手动标签 |\r
| PubDate | date | 发布时间 |\r
| Words | number | 字数统计(剔除 HTML) |\r
| ts | date | 存档时间(东八区) |\r
\r
## 关键说明\r
\r
- **Cookies**:知乎/微信反爬需配置(Netscape 格式),小红书/豆瓣无需登录\r
- **关键词**:LLM 优先(DashScope),未配置或失败自动降级本地词频\r
- **图片**:上传失败不阻断,成功多少记录多少\r
- **时间**:统一 `YYYY-MM-DD HH:MM:SS`,缺失时留空(不伪造)\r
- **模块**:`main.py` 可作 Python 模块调用:`from main import fetch_and_archive_article`\r
\r
## 安全与隐私\r
\r
- **URL 校验**:严格白名单匹配 hostname,拒绝路径拼接攻击\r
- **Cookie 隔离**:Netscape Cookies 按域名过滤,仅附加到匹配的请求\r
- **LLM 数据外发**:配置 `DASHSCOPE_API_KEY` 时,文章内容会发送至 DashScope API(仅用于关键词提取)\r
- **敏感信息**:AK/SK/Key 等仅存储于本地,skill 不会外泄\r
- **权限最小化**:OSS Bucket 建议仅授予 PutObject/GetObject,Notion Integration 仅授予目标数据库读写权限\r
- **依赖锁定**:requirements.txt 使用精确版本号,避免供应链风险\r
\r
## 扩展平台\r
\r
1. `fetchers/` 下创建 `xxx_fetcher.py`,继承 `BaseFetcher` 实现 `fetch_article()`\r
2. `detector/platform_detector.py` 的 `ALLOWED_HOSTS` 添加平台域名\r
3. `main.py` 的 `FETCHER_REGISTRY` 注册\r
Usage Guidance
Before installing, create least-privilege OSS and Notion credentials, use only the target Notion database, provide platform cookies only if needed, and leave DashScope disabled if you do not want article text sent to an external LLM provider. Prefer the pinned requirements.txt install path.
Capability Analysis
Type: OpenClaw Skill Name: article-fetcher Version: 1.0.1 The article-fetcher skill is a well-structured tool for archiving content from WeChat, Xiaohongshu, Douban, and Zhihu to Notion. It demonstrates good security practices, such as strict URL hostname whitelisting in `detector/platform_detector.py` to prevent SSRF/path-traversal attacks and domain-specific cookie isolation in `fetchers/base_fetcher.py` to prevent credential leakage. While it handles sensitive API keys (Aliyun OSS, Notion, DashScope) and transmits data to these services, this behavior is transparently documented in `SKILL.md` and `README.md` as core functionality, with no evidence of unauthorized data exfiltration or malicious code execution.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The capabilities are coherent with the stated purpose: fetch supported article URLs, upload images to OSS, extract keywords, and archive to Notion. These are externally connected actions with real account impact, so users should review the configured scopes.
Instruction Scope
The instructions are user-invoked and centered on processing a supplied article URL. No hidden goal changes, background behavior, or instruction-priority manipulation was evident in the provided artifacts.
Install Mechanism
The documented quick start uses requirements.txt with pinned versions, but the SKILL.md front matter also lists an unpinned pip package set. Users should prefer the pinned requirements file.
Credentials
The required OSS and Notion credentials and optional DashScope/cookie configuration are proportionate to the feature set, but they are sensitive and should be limited to the intended bucket, database, and platforms.
Persistence & Privilege
The skill persists article content to Notion, images to OSS, and local logs under the skill directory. This is expected for archiving but should be understood before use.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install article-fetcher
  3. After installation, invoke the skill by name or use /article-fetcher
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
## v1.0.1 (2026-05-07) ### 🔒 安全修复(ClawScan 扫描) - **Cookie 域隔离**: `base_fetcher.py` 重构 `_load_cookies()` 保留 domain 字段,新增 `_apply_cookies_for_url(url)` 按目标域名过滤,防止登录态泄露到非目标站点 - **URL 严格校验**: `platform_detector.py` 改用 `urllib.parse.urlparse` + 白名单匹配 hostname,拒绝路径拼接攻击(如 `https://evil.com/mp.weixin.qq.com/...`) - **依赖版本锁定**: `requirements.txt` `>=` → `==` 精确版本,降低供应链风险 ### 📝 文档 - **安全说明**: SKILL.md 新增「安全与隐私」章节,披露 LLM 数据外发、Cookie 隔离、权限最小化等安全边界 - **扩展指南**: 更新平台扩展步骤(`ALLOWED_HOSTS` 替换旧正则描述)
v1.0.0
Article Fetcher v1.0.0 – Initial Release - Supports automatic fetching of articles from WeChat Official Accounts, Xiaohongshu, Douban, and Zhihu. - Uploads images to OSS and uses LLM or local word frequency for smart keyword extraction. - Archives articles to Notion with structured metadata fields. - Supports optional anti-crawling cookies and LLM API integration. - Easy to extend with additional platforms via plugin-like fetchers.
Metadata
Slug article-fetcher
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Article Fetcher(文章抓取+Notion存档)?

抓取微信公众号、小红书、豆瓣、知乎文章,自动上传 OSS 图片,LLM 智能提取关键词,一键存档到 Notion. It is an AI Agent Skill for Claude Code / OpenClaw, with 19 downloads so far.

How do I install Article Fetcher(文章抓取+Notion存档)?

Run "/install article-fetcher" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Article Fetcher(文章抓取+Notion存档) free?

Yes, Article Fetcher(文章抓取+Notion存档) is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Article Fetcher(文章抓取+Notion存档) support?

Article Fetcher(文章抓取+Notion存档) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Article Fetcher(文章抓取+Notion存档)?

It is built and maintained by haozhenjie (@ajayhao); the current version is v1.0.1.

💬 Comments