/install gmail-link-archiver
Gmail Link Archiver
Archive web content from your email links. This skill connects to Gmail via IMAP, filters emails by a subject prefix keyword, crawls every link using Playwright (headless Chromium), converts pages to Markdown, and saves them to your OpenClaw workspace.
Quick Start
1. Install dependencies (one-time)
bash references/setup.sh
This automatically installs:
playwright(Python) + Chromium browser binaryhtml2textfor HTML→Markdown conversion
2. First run — interactive setup
python3 references/gmail_link_archiver.py
The first run will prompt you for:
| Setting | Description | Default |
|---|---|---|
| IMAP server | Gmail IMAP host | imap.gmail.com |
| IMAP port | SSL port | 993 |
| Gmail address | Your full email address | — |
| App password | Gmail App Password (NOT your regular password) | — |
| Default mailbox | IMAP folder to search | INBOX |
| Subject prefix | Filter emails whose subject starts with this | — |
| Workspace path | Where to save Markdown files | ~/openclaw-workspace/mail-archive |
Credentials are saved locally to ~/.config/gmail-link-archiver/config.json with 0600 permissions. They are never transmitted or logged.
Gmail App Password: You need to generate an App Password at https://myaccount.google.com/apppasswords (requires 2FA enabled).
3. Subsequent runs
After the first setup, subsequent runs will read credentials from the saved config:
# Use saved config defaults
python3 references/gmail_link_archiver.py
# Override mailbox and prefix on the fly
python3 references/gmail_link_archiver.py --mailbox "INBOX" --subject-prefix "[Newsletter]"
# Save to a different workspace
python3 references/gmail_link_archiver.py --workspace ~/my-archive
# Limit number of links to crawl
python3 references/gmail_link_archiver.py --max-links 10
# Re-run the setup interview
python3 references/gmail_link_archiver.py --reconfigure
How It Works
- Connect — Authenticates to Gmail via IMAP SSL
- Filter — Searches the specified mailbox for emails matching the subject prefix
- Extract — Parses email bodies (HTML + plain text) to find HTTP/HTTPS links
- Crawl — Opens each link in headless Chromium via Playwright (bypasses bot detection, renders JavaScript)
- Convert — Transforms the crawled HTML into clean Markdown with metadata headers
- Save — Writes each Markdown file to the workspace directory
Pipeline Diagram
Gmail IMAP ──► Filter by Subject ──► Extract Links
│
▼
Playwright + Chromium (headless)
│
▼
HTML → Markdown (html2text)
│
▼
Save to OpenClaw Workspace
CLI Reference
usage: gmail_link_archiver.py [-h] [--mailbox MAILBOX]
[--subject-prefix PREFIX]
[--workspace PATH]
[--max-links N]
[--reconfigure]
Options:
--mailbox, -m IMAP mailbox to search (default: from config)
--subject-prefix, -s Subject prefix to filter emails
--workspace, -w Directory to save Markdown files
--max-links Max number of links to crawl (default: 50)
--reconfigure Re-run the setup interview
Output Format
Each crawled page is saved as a Markdown file with YAML frontmatter:
---
source: https://example.com/article
crawled_at: 2026-03-27T12:00:00Z
---
# Article Title
Article content converted to clean Markdown...
Files are named using a sanitized version of the URL plus a short hash for uniqueness.
Example Usage with Claude
Ask Claude to run the archiver:
"Run the Gmail Link Archiver to crawl links from my emails with subject starting with '[ReadLater]'"
Claude will execute:
python3 references/gmail_link_archiver.py --subject-prefix "[ReadLater]"
Or to set up fresh:
"Set up the Gmail Link Archiver with my credentials"
python3 references/gmail_link_archiver.py --reconfigure
Troubleshooting
"App password" rejected?
- Ensure 2-Step Verification is enabled on your Google account
- Generate a new App Password at https://myaccount.google.com/apppasswords
- Use the 16-character password without spaces
Playwright/Chromium issues?
# Reinstall Chromium
python3 -m playwright install chromium
# Install system dependencies (Linux)
sudo python3 -m playwright install-deps chromium
No emails found?
- Check the mailbox name (use
INBOX,[Gmail]/All Mail, etc.) - Verify the subject prefix matches exactly (case-sensitive)
- Try a broader prefix
Permission denied on config file?
chmod 600 ~/.config/gmail-link-archiver/config.json
Security
- Credentials are stored locally at
~/.config/gmail-link-archiver/config.json - File permissions are set to
0600(owner read/write only) - Credentials are never transmitted anywhere except to the IMAP server
- Credentials are never logged or printed to stdout
- Use Gmail App Passwords (not your main Google password)
- The config directory has
0700permissions
Requirements
- Python 3.8+
- Linux (Ubuntu/Debian) for MVP
- Gmail account with IMAP enabled and App Password
- Internet connection for IMAP and web crawling
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install gmail-link-archiver - 安装完成后,直接呼叫该 Skill 的名称或使用
/gmail-link-archiver触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Gmail Link Archiver 是什么?
Connects to Gmail via IMAP, filters emails by subject prefix keyword in a specified mailbox, crawls links found in filtered emails using Playwright (to bypas... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 125 次。
如何安装 Gmail Link Archiver?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install gmail-link-archiver」即可一键安装,无需额外配置。
Gmail Link Archiver 是免费的吗?
是的,Gmail Link Archiver 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Gmail Link Archiver 支持哪些平台?
Gmail Link Archiver 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Gmail Link Archiver?
由 목진왕(@jinwangmok)开发并维护,当前版本 v1.1.0。