← Back to Skills Marketplace
chenghan66

Journal Deep Intel Extractor

by Chenghan66 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
99
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install journal-intel-extractor
Description
专业的学术情报提取工具。支持 Nature/Science/Cell 等全球主流期刊,自动化抓取过去 N 天内新增的 Article 或 Review,并深度提取 PMID 与 Abstract 全文,为 AI 科普总结提供核心数据源。
README (SKILL.md)

🎓 Journal Deep Intel Intelligence Station

这是一个为医学与生命科学科研人员定制的自动化情报工具。它解决了“只看标题不了解实质内容”的痛点,通过模拟深度访问,为每一篇新文献建立完整的摘要档案。

🌟 核心功能

  • 深度抓取:不同于常规爬虫,本工具会逐一进入 PubMed 详情页提取 Abstract (摘要)
  • 精准过滤:利用 PubMed 官方 Publication Type 标签,自动剔除新闻、社论和简报,只留硬核干货。
  • 时间窗口监控:基于 [pdat] 逻辑,支持按周或按月生成定制化文献简报。
  • AI 友好型输出:生成结构化的 JSON 数据,完美适配 OpenClaw 内部的 LLM 总结流程。

🛠️ 技术实现

  1. 引擎:基于 Python 3.x 配合 BeautifulSoup4 处理 HTML 解析。
  2. 频率控制:内置 0.5s 的抓取延迟(Rate Limiting),保护您的 IP 不被 PubMed 临时封禁。
  3. 本地归档:数据自动保存至 ~/Documents/Journal_Intel/ 目录下,按日期和期刊名分类存储。

📖 使用场景示例

  • 场景一:Nature 周报 参数:journal="Nature", type="Article", days=7
  • 场景二:顶级综述追踪 参数:journal="Science", type="Review", days=30

⚠️ 运行提示

由于需要进行深度详情页抓取,运行速度约为 1秒/篇。若当周更新较多(如超过 50 篇),请耐心等待脚本运行结束。

Usage Guidance
This skill appears to do what it says: it scrapes PubMed search results and article pages for PMIDs, titles, and abstracts and saves them as JSON in ~/Documents/Journal_Intel/. Before installing or running, consider: (1) The description's phrase “deep access / 全文” may imply retrieving paywalled full text, but this script only fetches PubMed pages/abstracts — if you expect full articles you'll need different code or credentials. (2) Respect PubMed/NLM terms of use and robots.txt; if you plan frequent runs increase the delay or use official APIs (e.g., Entrez E-utilities) to avoid throttling. (3) The SKILL.md assumes a virtualenv (venv/bin/python3) but no install step is provided; you should create a virtualenv and pip install -r requirements.txt before running. (4) The script writes files to your Documents folder — confirm you’re comfortable with that path and disk use, or modify the save location. (5) No credentials or external endpoints are requested, and the code does not exfiltrate data beyond contacting PubMed. If you need stronger guarantees, inspect/modify the script to use Entrez APIs (with an API key) and add explicit error handling and rate limiting.
Capability Analysis
Type: OpenClaw Skill Name: journal-intel-extractor Version: 1.0.0 The skill is a legitimate academic scraping tool designed to extract article abstracts from PubMed. The code in main.py uses standard libraries (requests, BeautifulSoup) to fetch data from a specific domain (pubmed.ncbi.nlm.nih.gov), implements basic rate limiting, and saves the results to a local directory (~/Documents/Journal_Intel) as described in SKILL.md. There are no signs of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description claim to collect PMIDs and abstracts from major journals; the code queries PubMed and visits PubMed detail pages to extract abstracts and titles. Requested resources (none) match the task. Minor wording mismatch: the README language suggests “deep access” and may be interpreted as retrieving full-text articles, but the implementation only fetches PubMed pages/abstracts.
Instruction Scope
SKILL.md instructs running the Python script with journal/type/days arguments. The runtime behavior is limited to HTTP GETs to pubmed.ncbi.nlm.nih.gov, HTML parsing, and writing a JSON file to ~/Documents/Journal_Intel. The script does not read other files, environment vars, or contact third-party endpoints beyond PubMed.
Install Mechanism
There is no install spec; the skill is instruction-only but includes requirements.txt and a script. Dependencies (requests, beautifulsoup4, lxml) are reasonable for the task. The SKILL.md entry references venv/bin/python3 but no virtualenv creation step is provided — this is an operational mismatch (not a security issue) you should be aware of.
Credentials
The skill requires no environment variables, no credentials, and does not request unrelated secrets. Network access is only used for PubMed; User-Agent header is hard-coded in the script.
Persistence & Privilege
The skill writes output files under the user's home Documents folder (~/Documents/Journal_Intel). It is not always-enabled and does not modify other skills or system configuration. Autonomous invocation is allowed (platform default); combined with file writes, consider whether you want the agent to run this without manual review.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install journal-intel-extractor
  3. After installation, invoke the skill by name or use /journal-intel-extractor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Journal Deep Intel Extractor 1.7.0 introduces deep extraction mode for journal articles. - Now fetches both titles and article abstracts by navigating to PubMed article detail pages. - Provides raw materials (abstracts) for AI-generated lay summaries. - Includes support for filtering by journal name, article type (Article or Review), and days to look back. - Note: Extraction time increases with the number of articles (about 1 second per article), as each abstract is fetched individually.
Metadata
Slug journal-intel-extractor
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Journal Deep Intel Extractor?

专业的学术情报提取工具。支持 Nature/Science/Cell 等全球主流期刊,自动化抓取过去 N 天内新增的 Article 或 Review,并深度提取 PMID 与 Abstract 全文,为 AI 科普总结提供核心数据源。 It is an AI Agent Skill for Claude Code / OpenClaw, with 99 downloads so far.

How do I install Journal Deep Intel Extractor?

Run "/install journal-intel-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Journal Deep Intel Extractor free?

Yes, Journal Deep Intel Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Journal Deep Intel Extractor support?

Journal Deep Intel Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Journal Deep Intel Extractor?

It is built and maintained by Chenghan66 (@chenghan66); the current version is v1.0.0.

💬 Comments