Description

Search and extract papers from paid academic databases via Browser Relay with low-token evaluate scripts. Currently fully tested on IEEE Xplore only; WoS, Sc...

README (SKILL.md)

Paid Database Access · 付费数据库访问

Name: Openclaw Paid Db Access
Author: lishy227

Let AI search paywalled academic databases through your real browser — 30× fewer tokens than snapshot, 10× faster. 让 AI 通过你的真实浏览器高效访问付费学术数据库——Token 消耗降低 30 倍，速度提升 10 倍。

Quick Verification · 快速验证

First run — verify end-to-end in 30 seconds / 首次运行 30 秒验证：

Open browser → log into IEEE Xplore (institutional SSO) → click Relay icon (it turns bright) / 打开浏览器 → 登录 IEEE → 点 Relay 图标变亮
Ask: "search IEEE for 'large language model scientific writing'" / 对 AI 说：「在 IEEE 搜索 large language model scientific writing」
If you get structured paper list → everything works ✅ / 拿到结构化论文列表 → 一切就绪

Status · 状态: IEEE Xplore ✅ verified · CNKI 知网 ✅ verified · WoS/Scopus/ACM 📋 template ready, pending verification

Prerequisites · 前置条件

Browser Relay extension installed / Browser Relay 扩展已安装 (from OpenClaw assets/chrome-extension/)
Extension configured with Gateway URL + Token (from ~/.openclaw/openclaw.json) / 扩展已配置 Gateway URL + Token
User logged into target database via institutional SSO / 用户已通过学校 SSO 登录目标数据库
Relay icon activated on the database tab (bright/colored) / Relay 图标已激活（亮色）

Workflow · 使用流程

Step 1: Verify connection · 验证连接

browser.status → profile: "chrome", running: true, cdpReady: true

If offline / 若不在线: "Open browser → log into database → click Relay icon / 请打开浏览器 → 登录数据库 → 点击 Relay 图标激活"

Step 2: Search strategy · 搜索策略

Think before you navigate / 导航前先想好搜索策略：

Situation / 情况	Action / 策略
Non-English concept (e.g. 自动生成科研论文) / 非英语概念	Translate first → split into 2-3 complementary queries / 先翻译 → 拆成 2-3 个互补搜索词
Results > 500	Add year/type filters or tighten query / 加过滤或收紧搜索词
Results \x3C 5	Broaden query (remove quotes, add synonyms, expand year range) / 放宽搜索词
Results = 0	Switch database or try broader keywords / 换数据库或去掉引号
No good results on first try	Use simple evaluate to inspect page content, then adjust / 用简单 evaluate 探路后再调

Prefer Semantic Scholar + arXiv APIs for free/open papers first; use Browser Relay only for paywalled content, old papers, patents, citation reports, or Chinese papers. / 优先用 Semantic Scholar + arXiv API 搜免费论文；Browser Relay 仅用于付费内容、老论文、专利、引文报告、中文论文。

Step 3: Navigate to search · 导航到搜索

Construct search URL with query + filters directly — never simulate clicking the search box. / 用 URL 参数直接构造搜索——不要模拟点击搜索框。

navigate to: https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=\x3Cquery>&ranges=\x3Cyear1>_\x3Cyear2>_Year

Database · 数据库	Search URL	Page param · 翻页参数
IEEE Xplore	`.../search/searchresult.jsp?queryText={q}&ranges={y1}_{y2}_Year`	`&pageNumber={n}`
ACM DL	`.../action/doSearch?AllField={q}`	`&startPage={n}`
Scopus	`.../results/results.uri?query={q}`	`&offset={n}`
WoS	`.../wos/woscc/basic-search` (requires interaction · 需交互)	—
CNKI 知网	`.../kns8s/AdvSearch` (requires interaction · 需交互)	—

Step 4: Load extractor · 加载提取脚本

Read from extractors/ — never write JS manually. / 从 extractors/ 目录读取——不要手写 JS。

read extractors/ieee.js    # IEEE Xplore
read extractors/acm.js     # ACM Digital Library
read extractors/scopus.js  # Scopus
read extractors/wos.js     # Web of Science
read extractors/cnki.js    # 中国知网

Step 5: Extract + auto-paginate · 提取 + 自动翻页

Use browser.act(kind="evaluate", fn=\x3Cscript>). Scripts have built-in deduplication (by link), so count = exact paper count. / 脚本已内置去重（按 link），count 精确等于实际论文数。

browser.act(profile="chrome", targetId="\x3CID>", kind="evaluate", fn="\x3Cextractor content>")

Pagination logic · 翻页逻辑： The extractor returns totalPages, currentPage, perPage. If totalPages > 1:

Page 1 evaluate → { totalResults: "59", count: 25, totalPages: 3 }
  → navigate: URL + &pageNumber=2 → evaluate Page 2
  → navigate: URL + &pageNumber=3 → evaluate Page 3
  → Merge all papers (cross-page dedup by link)

If totalPages is "?", try &pageNumber=2 manually — if 404 or empty, there's only 1 page. / 如果 totalPages 是 ?，手动试 &pageNumber=2，若 404 或空则只有 1 页。

Step 6: arXiv matching · arXiv 匹配

Match extracted papers to free arXiv versions: / 匹配提取到的论文到 arXiv 免费版：

echo '\x3Cpapers JSON array>' | python scripts/arxiv_match.py --delay 1.5

Papers · 论文数	Strategy · 策略
≤ 10	Sync match, reply together / 同步匹配，一起回复
> 10	Reply with results first, match arXiv in background, append PDF links / 先回复主结果，后台异步匹配后追加 PDF 链接

HIGH confidence → provide PDF link / 提供 PDF 直链
MEDIUM confidence → provide PDF with "verify manually" warning / 提供 PDF 但标注需核实
LOW confidence → no PDF link, mark "verify manually" / 不提供链接
No match → mark "no arXiv version" / 无 arXiv 版本

Step 7: Present results · 呈现结果

Consolidate papers from all pages + databases, deduplicate globally, present to user with arXiv PDF links where available. / 汇总所有页面+数据库的论文，全局去重后呈现，附带 arXiv PDF 直链。

Token efficiency · Token 效率: evaluate ≈ 500 tokens/page vs snapshot ≈ 15,000 tokens/page — 30× saving. / evaluate 约 500 tokens/页 vs snapshot 约 15,000 tokens/页——节省 30 倍。

Extractor Spec · 提取脚本规范

All extractors/*.js must follow v2 spec: built-in dedup + pagination info. / 所有提取脚本必须遵循 v2 规范：内置去重 + 翻页信息。

(() => {
    const seen = new Set();   // dedup by link / 按 link 去重
    const results = [];

    // Selector priority: try most specific first, pick the one with most matches
    // 选择器策略：优先最精确的，选匹配最多的

    // ... extraction logic / 提取逻辑 ...

    return {
        totalResults: "number or '?' / 数字或'?'",
        count: results.length,
        totalPages: "number or '?' / 数字或'?'",
        currentPage: 1,
        perPage: 25,
        database: 'ieee',
        papers: results
    };
})()

Standard paper fields · 标准论文字段：

{
    "title": "Paper title · 论文标题",
    "authors": "Author1; Author2",
    "year": "2024",
    "venue": "Journal / Conference name · 期刊/会议名",
    "type": "Journal Article | Conference Paper | ...",
    "link": "Original URL · 原文链接",
    "doi": "DOI (if available · 如有)",
    "abstract": "Abstract snippet (if available · 如有)",
    "citations": "Citation count (if available · 如有)"
}

For databases without an existing extractor, first probe with simple evaluate: / 对没有提取脚本的数据库，先用简单 evaluate 探路：

() => { return {
    title: document.title,
    bodyClasses: document.body.className,
    mainSelectors: Array.from(document.querySelectorAll('h1,h2,h3')).map(h=>h.innerText).slice(0,10)
};}

Error Handling · 错误处理

Error · 错误	Cause · 原因	Fix · 解决方案
`browser.status → running: false`	Relay not activated / 未激活	Click Relay icon on browser tab / 点击浏览器标签页 Relay 图标
`tabs: []`	No attached tab / 无附加标签页	Same as above / 同上
`navigate` returns 418	Cloudflare block / 被拦截	Cookie expired, re-login / Cookie 过期，重新登录
`evaluate` returns `count: 0`	Selector mismatch / 选择器不匹配	Probe page first: `() => ({title: document.title, text: document.body.innerText.substring(0,500)})` then adjust selectors / 先用简单 JS 探路再调选择器
`evaluate` returns `undefined`	JS syntax error / 语法错误	Test script in browser Console first / 先在浏览器 Console 验证
Page title contains "Sign In"	Login lost / 登录失效	Re-login / 提示用户重新登录
`Can't reach browser control service`	Gateway down	Run `openclaw gateway restart` / 运行 `openclaw gateway restart`
`evaluate` → `tab not found`	CDP not attached / CDP 未附加	Click Relay icon on current tab / 在当前标签页点击 Relay 图标
`count` >> expected (e.g. 100 vs 25)	Old script without dedup / 旧版未去重	Use v2 extractor (with `new Set()`) / 确认用了 v2 脚本

Database Status · 数据库验证状态

⚠️ Honest disclosure · 诚实声明: Only IEEE Xplore has been fully tested end-to-end (search → extract → paginate → arXiv match). All other extractors are templates — written based on static page structure analysis, never run against a live logged-in search. Assume they need selector adjustments before they work. / 仅 IEEE Xplore 完成了端到端实测（检索→提取→翻页→arXiv 匹配）。其他所有数据库的提取脚本均为模板——基于静态页面结构分析编写，未在真实登录检索环境中运行过。使用前预期需要调整选择器。

Database · 数据库	Search · 检索	Extraction · 提取	Pagination · 翻页	arXiv	Notes · 备注
IEEE Xplore	✅	✅	✅	✅	Only fully tested DB · 唯一完整测试
CNKI 知网	📋	📋	—	N/A	Template, needs campus VPN · 模板，需学校 VPN
Web of Science	📋	📋	—	📋	Template, untested · 模板未测试
Scopus	📋	📋	—	📋	Template, untested · 模板未测试
ACM DL	📋	📋	—	📋	Template, untested · 模板未测试

✅ = verified in live session · 📋 = template provided, needs verification

Adding a New Database · 添加新数据库

Want to add PubMed, JSTOR, ProQuest, or your university's custom repository? Here's the 4-step recipe. / 想添加 PubMed、JSTOR、ProQuest 或学校自建库？四步搞定。

1. Probe the search page · 探索搜索页

Log into the database in your browser, do a test search, then run: / 浏览器登录数据库，做一次测试搜索，然后执行：

// Paste into browser.act(kind="evaluate", fn=...) / 粘贴到 evaluate 中执行
() => { return {
    url: window.location.href,
    title: document.title,
    resultCount: document.querySelector('[class*=result], [class*=count]')?.innerText?.substring(0,200),
    itemSelector: (() => {
        // Try common patterns — find the one that matches paper cards
        for (const sel of [
            '[class*=result-item]', '[class*=search-result]',
            '.document-item', '[class*=record]', 'article',
            '.List-results-items > *', '.results > li'
        ]) {
            const n = document.querySelectorAll(sel).length;
            if (n >= 3) return sel + ' → ' + n + ' items';
        }
        return 'UNKNOWN — inspect manually';
    })(),
    sampleHTML: (() => {
        const first = document.querySelector('[class*=result-item], [class*=search-result], article, [class*=record]');
        return first?.innerHTML?.substring(0, 1000) || 'no match';
    })(),
    pagination: (() => {
        const nextBtn = document.querySelector('[class*=next], [class*=pagination] a:last-child, [aria-label*=next]');
        return nextBtn ? 'Found next button: ' + (nextBtn.href || nextBtn.outerHTML?.substring(0,100)) : 'No pagination found';
    })(),
    searchURL: (() => {
        // Check if URL contains search params (easy case) or is generic (hard case)
        const u = window.location.href;
        if (u.includes('query') || u.includes('search') || u.includes('q=')) return 'URL-based: ' + u.substring(0,200);
        return 'Form-based (may need POST) — current URL: ' + u.substring(0,200);
    })()
};}

2. Write the extractor · 编写提取脚本

Copy extractors/ieee.js as a starting point. Three things matter: / 复制 extractors/ieee.js 作为起点。三个关键点：

// a) Item selector — from step 1 probe results
const items = document.querySelectorAll('.your-result-item-selector');

// b) Inner selectors — open browser DevTools, inspect one paper card
const title = item.querySelector('.your-title-selector');
const authors = item.querySelector('.your-author-selector');
// ...

// c) Dedup key — mandatory for every extractor
const seen = new Set();  // key on link or DOI

Testing tip · 测试技巧: Before saving, paste your extractor into browser DevTools Console and check the output. / 保存前先粘贴到浏览器 Console 验证输出。

3. Find the pagination pattern · 找到翻页规律

Three common patterns — test which one works: / 三种常见模式——逐一测试：

// Pattern A: URL parameter (like IEEE: &pageNumber=3)
navigate to: baseURL + '&pageNumber=2'  // or &page=2, &start=25

// Pattern B: Offset parameter (like Scopus: &offset=25)
navigate to: baseURL + '&offset=25'

// Pattern C: Next button click (for JS-heavy sites)
browser.act(kind="click", ref="next-page-button")

4. Add to config.yaml · 写入配置

databases:
  your_db:
    name: "Database Name"
    enabled: true
    base_url: "https://..."
    search_url: "https://.../search?query={q}"
    page_param: "&page={n}"          # or "&start={n}" / "click"
    extractor: "extractors/your_db.js"
    cookies:
      required: ["SESSION_ID"]

That's it. The flow is always: probe → write extractor → find pagination → test. / 流程永远是：探路 → 写提取脚本 → 找翻页规律 → 测试。

Known Limitations · 已知限制

Browser dependency · 浏览器依赖: User's real browser must be online and logged in / 用户真实浏览器需在线且已登录
Cookie expiry · Cookie 过期: Session cookies expire; re-login required / 会话 Cookie 过期需重新登录
Selector fragility · 选择器脆弱: Database site redesigns may break extractors / 网站改版可能导致脚本失效
PDF download · PDF 下载受限: Institutional-level auth often required; arxiv_match.py provides free arXiv versions as workaround / 通过 arxiv_match.py 提供 arXiv 免费版替代
Not all DBs verified · 未全部实测: Only IEEE and CNKI fully tested; WoS/Scopus/ACM are template-quality / 仅 IEEE 和知网完整实测；WoS/Scopus/ACM 为模板级别
Compliance · 合规: Respect database ToS. Don't bulk download. Don't share credentials. / 遵守数据库使用条款，不批量下载，不分享凭证

Project Structure · 项目结构

paid-db-access/
├── SKILL.md                  # This file · 本文件
├── config.yaml               # User config · 用户配置
├── extractors/               # DB-specific JS extractors
│   ├── ieee.js               # ✅ v2 — dedup + pagination
│   ├── cnki.js               # ✅ v2
│   ├── acm.js                # ✅ v2
│   ├── scopus.js             # ✅ v2
│   └── wos.js                # ✅ v2
└── scripts/
    ├── cookie-extractor.py   # Extract minimal cookies from browser export
    └── arxiv_match.py        # Match papers to arXiv free PDFs

Tech Principle · 技术原理

Traditional snapshot:
  Walk every DOM node → serialize to accessibility tree → return entire tree
  遍历每个 DOM 节点 → 序列化成可访问性树 → 返回整棵树
  Cost: ~15,000 tokens/page

This skill (evaluate):
  Inject JS into page → extract only paper data → return structured JSON
  注入 JS 到页面 → 只提取论文数据 → 返回结构化 JSON
  Cost: ~500 tokens/page

Result: 30× token savings, 10× speed
结果: Token 节省 30 倍，速度提升 10 倍

Disclaimer · 免责声明

This project is for learning and research purposes only. / 本项目仅供学习和研究目的。

You must have legitimate institutional access (e.g. university subscription). This is not a paywall bypass tool. / 你必须有合法的机构访问权限（如学校订阅）。本项目不是绕过付费墙的工具。
Database websites may change at any time, breaking extractors. / 各数据库网站可能随时改版，导致提取脚本失效。
cookie-extractor.py runs locally and does not upload data. Still, clear sensitive cookies after use. / cookie-extractor.py 在本地运行不会上传数据，但建议使用后清除敏感 Cookie。
Do not bulk download papers. Do not use for commercial purposes. Do not share your access credentials. / 不要批量下载论文。不要用于商业目的。不要分享访问凭证。

MIT License

Usage Guidance

Install only if you are comfortable letting the agent inspect search-result pages inside your logged-in academic database session. Treat any copied cookies as passwords: avoid pasting them into chats or logs, clear them after use, and prefer Browser Relay's live browser session over storing reusable cookie values in configuration.

Capability Tags

requires-sensitive-credentials

Capability Assessment

⚠ Purpose & Capability

Searching authenticated academic databases through Browser Relay is coherent with the stated purpose, but the package also includes a cookie extraction workflow for institutional database session cookies that is not clearly disclosed in the top-level description and is not necessary to the main Browser Relay flow.

⚠ Instruction Scope

The runtime instructions tell the agent to run evaluate scripts inside the user's logged-in database tab and probe page content; that is purpose-aligned, but the skill does not prominently explain that those scripts can read protected page content visible to the authenticated session.

✓ Install Mechanism

No install-time persistence, package installation, autostart behavior, or hidden setup actions were found in the artifacts.

ℹ Credentials

Browser Relay access to a logged-in browser tab is proportionate for extracting search results from paid databases, and the extractor scripts appear limited to DOM parsing of paper metadata. The arXiv matcher performs network requests only to arXiv for title matching.

⚠ Persistence & Privilege

The cookie extractor processes live authentication cookies and prints full cookie headers to stdout for copying into configuration, which increases the chance of credential leakage through terminal logs, screenshots, clipboard history, or shared sessions. No exfiltration was found, but the handling is under-scoped for session secrets.

Version History

v1.0.0

Initial release of paid-db-access: enables efficient academic paper extraction from paywalled databases using Browser Relay. - Fully tested and verified on IEEE Xplore; templates included for WoS, Scopus, ACM, and CNKI (require verification). - Supports end-to-end extraction across multiple pages with built-in deduplication and optional arXiv PDF matching. - Detailed setup, troubleshooting, and usage guide provided for fast onboarding. - All extractors follow a unified specification; new databases can be added via template scripts. - Optimized for low token usage (≈30× more efficient than snapshots).

Metadata

Slug paid-db-access

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Openclaw Paid Db Access?

Search and extract papers from paid academic databases via Browser Relay with low-token evaluate scripts. Currently fully tested on IEEE Xplore only; WoS, Sc... It is an AI Agent Skill for Claude Code / OpenClaw, with 13 downloads so far.

How do I install Openclaw Paid Db Access?

Run "/install paid-db-access" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Openclaw Paid Db Access free?

Yes, Openclaw Paid Db Access is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Openclaw Paid Db Access support?

Openclaw Paid Db Access is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Openclaw Paid Db Access?

It is built and maintained by lishy227 (@lishy227); the current version is v1.0.0.

More Skills

Openclaw Paid Db Access