← 返回 Skills 市场
Article Fetcher(文章抓取+Notion存档)
作者
haozhenjie
· GitHub ↗
· v1.0.1
· MIT-0
19
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install article-fetcher
功能描述
抓取微信公众号、小红书、豆瓣、知乎文章,自动上传 OSS 图片,LLM 智能提取关键词,一键存档到 Notion
使用说明 (SKILL.md)
\r \r
Article Fetcher v1.0.1\r
\r 抓取微信公众号、小红书、豆瓣、知乎文章,自动上传 OSS 图床,LLM 智能关键词提取,一键存档到 Notion。\r \r
快速开始\r
\r
1. 安装依赖\r
\r
pip install -r requirements.txt\r
```\r
\r
### 2. 配置环境变量(`~/.openclaw/.env`)\r
\r
```bash\r
# 必需:OSS 图床\r
ALIYUN_OSS_AK=your_ak\r
ALIYUN_OSS_SK=your_sk\r
ALIYUN_OSS_BUCKET_ID=your_bucket\r
ALIYUN_OSS_ENDPOINT=oss-cn-shanghai.aliyuncs.com\r
\r
# 必需:Notion 存档\r
NOTION_API_KEY=secret_xxx\r
NOTION_ARTICLE_DATABASE_ID=database_id\r
\r
# 可选:LLM 关键词提取(DashScope)\r
DASHSCOPE_API_KEY=sk-xxx\r
DASHSCOPE_MODEL=qwen3.5-plus\r
\r
# 可选:Cookies(反爬,Netscape 格式)\r
WECHAT_COOKIES_FILE=~/.cookies/wechat_cookies.txt\r
ZHIHU_COOKIES_FILE=~/.cookies/zhihu_cookies.txt\r
```\r
\r
### 3. 使用\r
\r
```bash\r
cd \x3Cskill-dir>\r
python3 main.py "文章 URL" [标签1] [标签2]\r
```\r
\r
**支持平台**:微信公众号 (`mp.weixin.qq.com`)、小红书 (`xiaohongshu.com` / `xhslink.com`)、豆瓣 (`douban.com`)、知乎 (`zhihu.com`)\r
\r
## 处理流程\r
\r
```\r
URL → 平台识别 → 内容抓取 → 图片上传 OSS → 关键词提取 (LLM → 词频降级) → Notion 存档\r
```\r
\r
## Notion 数据库字段\r
\r
| 字段 | 类型 | 说明 |\r
|------|------|------|\r
| Title | title | 文章标题(≤200 字符) |\r
| Source | rich_text | 来源平台 |\r
| Author | rich_text | 作者 |\r
| Link | url | 原文链接 |\r
| Tags | multi_select | 自动提取关键词 + 手动标签 |\r
| PubDate | date | 发布时间 |\r
| Words | number | 字数统计(剔除 HTML) |\r
| ts | date | 存档时间(东八区) |\r
\r
## 关键说明\r
\r
- **Cookies**:知乎/微信反爬需配置(Netscape 格式),小红书/豆瓣无需登录\r
- **关键词**:LLM 优先(DashScope),未配置或失败自动降级本地词频\r
- **图片**:上传失败不阻断,成功多少记录多少\r
- **时间**:统一 `YYYY-MM-DD HH:MM:SS`,缺失时留空(不伪造)\r
- **模块**:`main.py` 可作 Python 模块调用:`from main import fetch_and_archive_article`\r
\r
## 安全与隐私\r
\r
- **URL 校验**:严格白名单匹配 hostname,拒绝路径拼接攻击\r
- **Cookie 隔离**:Netscape Cookies 按域名过滤,仅附加到匹配的请求\r
- **LLM 数据外发**:配置 `DASHSCOPE_API_KEY` 时,文章内容会发送至 DashScope API(仅用于关键词提取)\r
- **敏感信息**:AK/SK/Key 等仅存储于本地,skill 不会外泄\r
- **权限最小化**:OSS Bucket 建议仅授予 PutObject/GetObject,Notion Integration 仅授予目标数据库读写权限\r
- **依赖锁定**:requirements.txt 使用精确版本号,避免供应链风险\r
\r
## 扩展平台\r
\r
1. `fetchers/` 下创建 `xxx_fetcher.py`,继承 `BaseFetcher` 实现 `fetch_article()`\r
2. `detector/platform_detector.py` 的 `ALLOWED_HOSTS` 添加平台域名\r
3. `main.py` 的 `FETCHER_REGISTRY` 注册\r
安全使用建议
Before installing, create least-privilege OSS and Notion credentials, use only the target Notion database, provide platform cookies only if needed, and leave DashScope disabled if you do not want article text sent to an external LLM provider. Prefer the pinned requirements.txt install path.
功能分析
Type: OpenClaw Skill
Name: article-fetcher
Version: 1.0.1
The article-fetcher skill is a well-structured tool for archiving content from WeChat, Xiaohongshu, Douban, and Zhihu to Notion. It demonstrates good security practices, such as strict URL hostname whitelisting in `detector/platform_detector.py` to prevent SSRF/path-traversal attacks and domain-specific cookie isolation in `fetchers/base_fetcher.py` to prevent credential leakage. While it handles sensitive API keys (Aliyun OSS, Notion, DashScope) and transmits data to these services, this behavior is transparently documented in `SKILL.md` and `README.md` as core functionality, with no evidence of unauthorized data exfiltration or malicious code execution.
能力标签
能力评估
Purpose & Capability
The capabilities are coherent with the stated purpose: fetch supported article URLs, upload images to OSS, extract keywords, and archive to Notion. These are externally connected actions with real account impact, so users should review the configured scopes.
Instruction Scope
The instructions are user-invoked and centered on processing a supplied article URL. No hidden goal changes, background behavior, or instruction-priority manipulation was evident in the provided artifacts.
Install Mechanism
The documented quick start uses requirements.txt with pinned versions, but the SKILL.md front matter also lists an unpinned pip package set. Users should prefer the pinned requirements file.
Credentials
The required OSS and Notion credentials and optional DashScope/cookie configuration are proportionate to the feature set, but they are sensitive and should be limited to the intended bucket, database, and platforms.
Persistence & Privilege
The skill persists article content to Notion, images to OSS, and local logs under the skill directory. This is expected for archiving but should be understood before use.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install article-fetcher - 安装完成后,直接呼叫该 Skill 的名称或使用
/article-fetcher触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
## v1.0.1 (2026-05-07)
### 🔒 安全修复(ClawScan 扫描)
- **Cookie 域隔离**: `base_fetcher.py` 重构 `_load_cookies()` 保留 domain 字段,新增 `_apply_cookies_for_url(url)` 按目标域名过滤,防止登录态泄露到非目标站点
- **URL 严格校验**: `platform_detector.py` 改用 `urllib.parse.urlparse` + 白名单匹配 hostname,拒绝路径拼接攻击(如 `https://evil.com/mp.weixin.qq.com/...`)
- **依赖版本锁定**: `requirements.txt` `>=` → `==` 精确版本,降低供应链风险
### 📝 文档
- **安全说明**: SKILL.md 新增「安全与隐私」章节,披露 LLM 数据外发、Cookie 隔离、权限最小化等安全边界
- **扩展指南**: 更新平台扩展步骤(`ALLOWED_HOSTS` 替换旧正则描述)
v1.0.0
Article Fetcher v1.0.0 – Initial Release
- Supports automatic fetching of articles from WeChat Official Accounts, Xiaohongshu, Douban, and Zhihu.
- Uploads images to OSS and uses LLM or local word frequency for smart keyword extraction.
- Archives articles to Notion with structured metadata fields.
- Supports optional anti-crawling cookies and LLM API integration.
- Easy to extend with additional platforms via plugin-like fetchers.
元数据
常见问题
Article Fetcher(文章抓取+Notion存档) 是什么?
抓取微信公众号、小红书、豆瓣、知乎文章,自动上传 OSS 图片,LLM 智能提取关键词,一键存档到 Notion. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 19 次。
如何安装 Article Fetcher(文章抓取+Notion存档)?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install article-fetcher」即可一键安装,无需额外配置。
Article Fetcher(文章抓取+Notion存档) 是免费的吗?
是的,Article Fetcher(文章抓取+Notion存档) 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Article Fetcher(文章抓取+Notion存档) 支持哪些平台?
Article Fetcher(文章抓取+Notion存档) 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Article Fetcher(文章抓取+Notion存档)?
由 haozhenjie(@ajayhao)开发并维护,当前版本 v1.0.1。
推荐 Skills