← Back to Skills Marketplace
oreosofat

文献检索与下载全流程

by Oreosofat · GitHub ↗ · v1.0.3 · MIT-0
cross-platform ⚠ suspicious
105
Downloads
1
Stars
0
Active Installs
4
Versions
Install in OpenClaw
/install literature-research-pipeline
Description
端到端学术文献检索与下载全流程自动化。当用户请求检索文献、下载论文、查找学术资料、搜索论文,或提到"帮我找XX相关的文献"、"下载这篇论文"、"需要某篇文献"时触发本技能。完整流程:检索 → 推荐 → 多渠道下载 → 科研通常控监控 → 通知 → 进度追踪。
README (SKILL.md)

文献检索与下载全流程

概述

端到端学术文献检索与下载自动化。接收用户的研究主题 → 检索文献 → 推荐高价值目标 → 多渠道下载 → 科研通常控监控 → 应助后自动下载 → 通知用户。


环境变量与配置

本技能依赖以下环境变量(需在首次使用前配置):

变量名 必需 说明 示例
LIT_DOWNLOAD_DIR 论文下载保存目录 ~/Downloads
LIT_PROGRESS_FILE 下载进度追踪文件路径 memory/literature-progress.md
LIT_CDP_PORT 浏览器远程调试端口(默认 9334) 9334
LIT_NOTIFY_CHANNEL 通知渠道(如 wechat-access、telegram 等) wechat-access
LIT_NOTIFY_USER 通知目标用户 ID your-user-id
SEMANTIC_SCHOLAR_API_KEY Semantic Scholar API 密钥 your-key
LIT_UNPAYWALL_EMAIL Unpaywall API 所需邮箱 [email protected]

首次使用时,AI 应检查以上变量是否已配置。若缺失,主动询问用户并引导配置。 若用户未配置通知渠道,跳过通知步骤,仅在对话中告知结果。


流程概览

1. 文献检索  →  2. 结果展示  →  3. 用户确认  →  4a. 直接下载成功
                                                   ↓ (失败)
                                               4b. 科研通求助
                                                   ↓
                                               5. 建立Cron监控
                                                   ↓
                                               6. 应助 → 自动下载 → 通知用户
                                                   ↓
                                               7. 告知用户 + 更新进度

Step 1:文献检索

必须先读取 academic-literature-search 技能,路径通过以下方式定位:

  1. 优先查找当前 workspace 下的 skills/academic-literature-search/SKILL.md
  2. 其次查找 ~/.qclaw/skills/academic-literature-search/SKILL.md
  3. 若均不存在,提示用户先安装 academic-literature-search skill

使用其 scripts/search.py 执行检索:

import subprocess, os

# 自动定位脚本路径
workspace = os.environ.get("OPENCLAW_WORKSPACE", os.path.expanduser("~/.qclaw/workspace"))
search_script = os.path.join(workspace, "skills/academic-literature-search/scripts/search.py")

result = subprocess.run([
    "python3", search_script,
    "--query", "用户的研究主题",
    "--databases", "semantic_scholar,crossref",
    "--max_results", "20",
    "--output_format", "json"
], capture_output=True, text=True)

优先选择 Crossref 数据库(DOI 数据最权威,可靠性高)。

检索完成后立即检查每篇文献的 is_open_access 字段

  • open_access = true → 标记为可尝试 Unpaywall 直接下载
  • open_access = false → 直接规划科研通求助路线,避免在无效渠道浪费时间

Step 2:结果展示

呈现检索结果时使用以下格式(Markdown):

## 📚 文献检索结果(共 N 篇)

| # | 标题 | 作者 | 年份 | 期刊/会议 | DOI | 引用 | 开放获取 |
|---|------|------|------|-----------|-----|------|----------|
| 1 | ... | ... | 2023 | ... | 10.xxxx/xxx | 45 | ✅ |

### 🎯 高价值推荐

1. **[论文标题1]**(推荐理由)
   - DOI:`10.xxxx/xxx`
   - 亮点:...
2. **[论文标题2]**(推荐理由)
   - DOI:`10.xxxx/xxx`

推荐标准:高引用数 / 最新年份 / 开源可获取 / 直接相关用户主题


Step 3:确认用户需求

展示结果后,询问用户:「请告诉我想下载哪些论文(序号或标题),或者让我推荐?」

等待用户回复后,对每篇目标论文记录:

  • DOI、标题、发表年份
  • 是否开放获取
  • 目标下载优先级

Step 4a:多渠道直接下载

按以下优先级逐个尝试:

渠道 1:Unpaywall(最快)

GET https://api.unpaywall.org/v2/{DOI}?email={LIT_UNPAYWALL_EMAIL}
  • 响应中取 best_oa_location.landing_pagebest_oa_location.url_for_pdf
  • 注意:Unpaywall 有频率限制,每小时 ≤ 5000 请求

渠道 2:DOI.org 重定向

GET https://doi.org/{DOI}
(跟随重定向,查找 Content-Type: application/pdf 的最终 URL)
  • 若重定向至 Springer/IEEE/Elsevier → 返回 418 或需登录 → 放弃此渠道

渠道 3:Semantic Scholar PDF

GET https://api.semanticscholar.org/graph/v1/paper/{DOI}/PDF
(需设置 API Key:SEMANTIC_SCHOLAR_API_KEY)

渠道 4:Crossref PDF 链接

GET https://api.crossref.org/works/{DOI}
(从响应中取 `link` 字段)

成功标准:文件以 %PDF 开头(Magic Bytes),大小 > 50 KB 失败处理:记录失败原因(418 / 403 / 404 / 无 OA 版本),转向 Step 4b


Step 4b:科研通常控求助

前置条件

  • 用户已登录浏览器,开启了远程调试端口(默认 LIT_CDP_PORT,通常为 9334)
  • 参考命令(Mac):"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge" --remote-debugging-port=9334 "--remote-allow-origins=*"
  • 参考命令(Linux):google-chrome --remote-debugging-port=9334 --remote-allow-origins=*

步骤 4b-1:发布求助帖

通过 CDP 连接浏览器,在 https://www.ablesci.com/assist/create 发布求助:

关键参数获取(每次操作前必须从页面重新获取):

  1. CSRF Token:从页面 \x3Cmeta name="csrf-token"> 提取
  2. Cookie:通过 Network.getAllCookies 获取所有 ablesci.com 域名下的 cookie
  3. 标签页:每次操作前重新 list_tabs() 并 attach,不要缓存 tab ID

发布 API

POST https://www.ablesci.com/assist/create
Content-Type: application/x-www-form-urlencoded

_csrf={token}&title={标题}&content={内容}&tag_id={分类ID}

求助帖标题建议格式【求助全文】【期刊名+年份】论文标题 求助帖内容建议:包含 DOI、论文标题、作者、发表信息

步骤 4b-2:记录求助状态

每篇论文发布后,更新下载进度追踪表(路径:LIT_PROGRESS_FILE):

## 📥 文献下载进度

| 论文 | DOI | 状态 | 来源 | 备注 |
|------|-----|------|------|------|
| 论文标题1 | 10.xxxx/xxx | ✅ 已下载 | ablesci 应助 | 保存路径 |
| 论文标题2 | 10.xxxx/xxx | ⏳ 求助中 | 科研通 | ID: xxx |

Step 5:建立 Cron 监控任务

读取 qclaw-cron-skill 获取正确的 cron 配置语法

路径:~/Library/Application Support/QClaw/openclaw/config/skills/qclaw-cron-skill/SKILL.md

监控任务配置

30 分钟检查一次科研通常控状态:

Schedule

{"kind": "every", "everyMs": 1800000}

Payload(isolated session)

{
  "kind": "agentTurn",
  "message": "检查科研通常控求助帖状态...\
\
1. 读取进度文件(LIT_PROGRESS_FILE)获取当前进度\
2. 通过 CDP 连接浏览器(http://127.0.0.1:{LIT_CDP_PORT})\
3. 逐个访问求助帖详情页(URL格式:https://www.ablesci.com/assist/detail?id={帖子ID})\
4. 检查每篇论文的状态:\
   - 求助中 → 无操作\
   - 待确认(有人上传)→ 自动下载(见下方下载流程)\
   - 已完成 → 无操作\
5. 如有新应助(状态:待确认):\
   a. 提取下载页面链接\
   b. 通过浏览器触发下载\
   c. 更新进度文件\
   d. 通知用户(若已配置通知渠道)\
6. 如全部论文已下载完成,通知用户并删除 cron 监控任务"
}

Delivery(仅在配置了通知渠道时设置):

{
  "mode": "announce",
  "channel": "{LIT_NOTIFY_CHANNEL}",
  "to": "{LIT_NOTIFY_USER}"
}

注意sessionTarget = "isolated"(必须),payload.kind = "agentTurn"

通知模板

📥 论文下载完成!

论文:{标题}
来源:{来源}
保存位置:{LIT_DOWNLOAD_DIR}/{文件名}
状态:{进度表更新}

Step 6 & 7:自动下载与进度更新

当 cron 任务检测到新应助时:

  1. 提取下载 ID:从详情页 HTML 中解析下载链接
  2. 触发下载
    • 新建标签页打开下载链接
    • 执行 Page.setDownloadBehavior(behavior=allow, downloadPath={LIT_DOWNLOAD_DIR})
    • 等待文件从 .crdownload 变为 .pdf(通常 5-30 秒)
  3. 重命名文件:去掉 (科研通-ablesci.com) 等后缀,保留年份信息
  4. 验证 PDF:文件头为 %PDF,大小 > 50 KB
  5. 更新进度表LIT_PROGRESS_FILE
  6. 通知用户(若已配置通知渠道)

关键坑点记录(来自实战经验)

CDP 操作时序问题

  • accessibility tree refbackendDOMNodeId 在跨调用后会失效
  • 解决:每次操作前重新获取 ref,不跨步缓存
  • 优先使用 DOM.querySelectorAll + DOM.resolveNode 获取 objectId,再发送 Input.dispatchMouseEvent

科研通常控下载特殊机制

  • API file/request-download-token 返回 code=0 但 不返回 URL
  • 实际下载通过浏览器 XHR 流式传输,触发后等待浏览器自动下载
  • file_server=2 对应普通线路,file_server=3 对应高速线路

常见下载失败原因

  • IEEE/Elsevier 等商业出版社:返回 418(IP/地区限制)或 403(需登录)
  • 无开放获取版本:Unpaywall 查不到 → 直接转向科研通

依赖技能

技能 用途 安装方式
academic-literature-search Crossref/Semantic Scholar 文献检索 skillhub install academic-literature-search
browser-cdp CDP 浏览器自动化 内置或 skillhub install browser-cdp
qclaw-cron-skill 定时任务管理 内置
Usage Guidance
Before installing or enabling this skill: 1) Understand it will ask the agent to control your browser via the CDP port and read CSRF tokens/cookies for ablesci.com so it can post and later act on those posts — this gives it the ability to act on your logged-in session for that site. 2) It will read and execute a script from another local skill (academic-literature-search) via subprocess — inspect that script first. 3) It will create recurring cron tasks that run every ~30 minutes and perform automated actions; if you don't want recurring autonomous operations, do not enable cron or browser-cdp. 4) If you trust the source, audit the academic-literature-search skill code and test in an isolated environment (or with a browser profile that is not logged into sensitive accounts). 5) If you are uncomfortable with browser cookie/session access, decline browser-cdp permission or require a dedicated, logged-in browser profile limited to the target site. 6) Confirm that posting to third-party services (ablesci.com) and automated downloads comply with your institution's policies and the target site's terms.
Capability Analysis
Type: OpenClaw Skill Name: literature-research-pipeline Version: 1.0.3 The skill automates academic paper retrieval using high-risk capabilities including 'subprocess' for script execution, 'cron' for task persistence, and 'browser-cdp' for browser manipulation. It explicitly instructs the user to launch their browser with remote debugging enabled and wildcard origins allowed, which creates a significant security vulnerability. Additionally, the skill programmatically extracts browser cookies and CSRF tokens to automate interactions with the 'ablesci.com' platform. While these actions are aligned with the stated functional goals, the combination of broad filesystem access and the requirement to weaken browser security poses a substantial risk to the user's environment.
Capability Assessment
Purpose & Capability
The skill claims end-to-end literature search and download, and its declared env vars (download dir, progress file, optional API keys) plus filesystem and subprocess access are consistent with that. However, the addition of browser-cdp and instructions to extract cookies/CSRF tokens and post on the user's behalf (to ablesci.com) is a materially more sensitive capability than most download helpers require — it can be justified for 'post for help' functionality but is higher‑privilege than a pure downloader.
Instruction Scope
Runtime instructions explicitly tell the agent to locate and execute an external skill's script from the user's workspace, connect to the local browser via CDP to retrieve CSRF tokens and cookies for ablesci.com, and post help requests; then create cron jobs that repeatedly connect to the browser and automatically act on responses. These actions access browser session secrets and perform autonomous remote postings and downloads. The instructions also require reading/writing arbitrary progress files and invoking subprocesses — all of which broaden the attack surface beyond simple HTTP API calls.
Install Mechanism
This is an instruction-only skill (no install spec, no downloaded archives or third-party packages), which minimizes supply-chain/install-time risk.
Credentials
Declared environment variables (download dir, progress file, optional API keys/email) are reasonable. But the skill’s behavior depends on runtime access to browser cookies and CSRF tokens via CDP (not declared as an env var) — that is effectively access to session credentials. It also reads other skills from the workspace and executes their scripts via subprocess, which may introduce unexpected privileges if those scripts are untrusted.
Persistence & Privilege
The skill sets up recurring cron checks (every 30 minutes) that will autonomously connect to the browser and external site to monitor and download results. While 'always' is false, cron + browser-cdp + cookie access gives a recurring, autonomous capability with a nontrivial blast radius if abused.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install literature-research-pipeline
  3. After installation, invoke the skill by name or use /literature-research-pipeline
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.3
## 1.0.3 (2026-04-14) ### Fixed - Privacy compliance final version: all personal identity information removed - SKILL.md description optimized, clarifying trigger conditions and dependencies - Removed hardcoded path references from publish.py ### Changed - Skill dependencies description optimized - Workflow steps streamlined, reduced redundancy
v1.0.2
- Added structured env and permissions sections to skill metadata for clarity and compatibility. - No functional changes to the literature-research pipeline logic or workflow. - Documentation is now more explicit about required and optional environment variables. - Explicitly lists permissions needed: browser-cdp, filesystem-read, filesystem-write, cron, subprocess.
v1.0.1
Version 1.0.1 - 增加环境变量配置项(如下载路径、通知渠道、API key),自动检查并引导用户配置,提升易用性和可移植性。 - 通知与进度追踪逻辑增强:通知渠道、通知对象及进度文件可自定义,未配置时自动降级为对话提示。 - academic-literature-search skill 路径查找更灵活,容错友好。 - 大幅清晰化依赖技能、外部配置、关键环境变量及部署参考命令。 - 细化部分实现说明,优化自动化流程参数化程度。 - 简化部分描述(如去掉强依赖微信),流程自洽更通用。
v1.0.0
Initial release of the literature-research-pipeline skill. - First version published. - Basic placeholder documentation (SKILL.md) added.
Metadata
Slug literature-research-pipeline
Version 1.0.3
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 4
Frequently Asked Questions

What is 文献检索与下载全流程?

端到端学术文献检索与下载全流程自动化。当用户请求检索文献、下载论文、查找学术资料、搜索论文,或提到"帮我找XX相关的文献"、"下载这篇论文"、"需要某篇文献"时触发本技能。完整流程:检索 → 推荐 → 多渠道下载 → 科研通常控监控 → 通知 → 进度追踪。 It is an AI Agent Skill for Claude Code / OpenClaw, with 105 downloads so far.

How do I install 文献检索与下载全流程?

Run "/install literature-research-pipeline" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 文献检索与下载全流程 free?

Yes, 文献检索与下载全流程 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 文献检索与下载全流程 support?

文献检索与下载全流程 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 文献检索与下载全流程?

It is built and maintained by Oreosofat (@oreosofat); the current version is v1.0.3.

💬 Comments