← 返回 Skills 市场
gateswell

Biomed Dataset Finder

作者 Shuhuan Cao · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
53
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install biomed-dataset-finder
功能描述
Search NCBI GEO/SRA, NGDC-GSA, and CNGB for biomedical datasets by disease, treatment, species, pathology subtype, and data type. Returns bold dataset ID, li...
使用说明 (SKILL.md)

Biomedical Dataset Finder

Search public biomedical datasets from NCBI, NGDC, and CNGB by conversational query keywords.

Usage Trigger

User asks for datasets related to a disease/treatment/species/subtype/data type combination. Examples:

  • "Find colon cancer dMMR immunotherapy single-cell data"
  • "hepatocellular carcinoma PD-1 scRNA-seq baseline"
  • "lung cancer immunotherapy single cell data"

Data Sources (Priority Order)

Priority Source Database Accession Prefix
1st NCBI GEO Datasets (gds) GSE
1st NCBI SRA (single-cell queries) SRP/SRR
1st NGDC Genome Sequence Archive CRA
2nd CNGB CNGBdb CNP (requires token for some data)

Workflow

Step 1 — Parse Query

Extract from user message:

  • Disease/Cancer: e.g. colon cancer, hepatocellular carcinoma, lung cancer
  • Treatment: e.g. immunotherapy, PD-1, chemotherapy, baseline therapy
  • Species: human, mouse (defaults to human if unspecified)
  • Pathology Subtype: e.g. dMMR, MSI-H, KRAS mutant
  • Data Type: e.g. scRNA-seq, single-cell, RNA-seq, ChIP-seq, ATAC-seq

If any critical field is missing, ask the user to clarify.

Step 2 — NCBI Search (Primary)

Use NCBI E-utilities (free, no auth).

  1. Search gds database (GEO Datasets, NOT gse) with combined keywords
  2. For each result, pull accession (GSE prefix), title, summary, and pubmedids (list)
  3. Fetch article info (authors, title, journal, year, DOI) for each PMID
  4. For single-cell queries, also search sra database

Query: ({disease}) AND ({treatment}) AND ({species}) AND ({data_type})

Rate limit: ~3 requests/second.

Step 3 — NGDC Search (Primary)

API: https://ngdc.cncb.ac.cn/search/api/specific?q={keywords}&db=gsa&size=20

Requires User-Agent header. Filter response for type=="GSA" entries (CRA accessions).

Step 4 — CNGB Search (Secondary)

If CNGB token provided: search CNGBdb API. On auth error: ask user if they want to provide token or skip.

Step 5 — Output

Markdown table with bold dataset ID, article info (authors, title, journal, year, DOI), and direct links.

If no results: "No public datasets found matching your criteria. Try adjusting keywords or switching data sources."

Factuality Requirements (Critical — No Hallucinations)

This skill handles scientific research data. Fabricating a single dataset entry undermines the user's work.

Hard Rules

  1. Dataset IDs: Only use IDs returned by actual API responses. Never invent, guess, or infer IDs.
  2. Article info: Only populate from actual API/PubMed responses. Leave blank if no data returned.
  3. Links: Build from verified accession patterns (e.g. https://.../acc.cgi?acc={GSE}). Never guess URLs.
  4. "Not found" is valid: If a source returns 0 results, output the empty result — do not fabricate entries to fill the table.

Verification Checklist (before presenting results)

  • Every Dataset ID is from an API response, not memory or guess
  • Every Article Title + Authors + Journal is from a PubMed/API response, not reconstructed
  • Every Link follows the confirmed URL pattern for that database
  • If a field is empty in the API response, it must be blank - in the table — never fill with plausible text

Why This Matters

A researcher using wrong dataset IDs or fake article info could: waste weeks on non-existent data, cite non-existent papers, or compromise the validity of their research. The cost of hallucination here is far higher than in general conversation.

Security Notes

  • User keywords are private — do NOT log the raw search query string to stderr/stdout. Log only counts (e.g. "Searching 5 keywords...").
  • Token handling — CNGB token is passed via CLI arg only; never hardcode or log it.
  • No external exfiltration — results table contains only public dataset metadata; no user-provided content is stored or transmitted elsewhere.

CLI Tool

python3 skills/biomed-dataset-finder/scripts/search_datasets.py \
  --disease "colon cancer" --treatment "immunotherapy" \
  --species human --subtype dMMR --type scRNA-seq --max-results 10

API Reference

See references/ncbi_api.md for NCBI E-utilities details. See references/ngdc_api.md for NGDC GSA API details. See references/cngb_api.md for CNGBdb API details.

安全使用建议
Install only if you want an agent to help with ClawHub/Convex development or staff workflows. Use extra care with moderation, migration, deploy, PR publishing, and autoreview helper commands because they can affect accounts, production data, GitHub comments, or local execution permissions; review the command shown before allowing writes.
能力评估
Purpose & Capability
The skills cover Convex setup/auth/performance/migrations and ClawHub maintainer/moderation workflows; some capabilities are high impact, such as user bans, role changes, PR comments, deployments, and migrations, but these match the stated purposes.
Instruction Scope
Runtime instructions are explicit about when to use each workflow, require concrete targets for moderation, require reasons and confirmation before writes, and warn against bypassing server auth, role checks, or audit logs.
Install Mechanism
No hidden installer or persistence hook was found in the skill artifacts; executable content is limited to an autoreview helper script and documented repo-local commands.
Credentials
The skills may use local repo files, GitHub CLI, Convex CLI, npm/npx installs, auth provider configuration, and local proof artifacts; this is proportionate to development, moderation, and deployment tasks, though users should understand these tools can affect real services.
Persistence & Privilege
No stealth persistence was identified. Persistent effects are disclosed workflow outcomes, such as Convex deployments/migrations, GitHub PR comments, proof artifacts, or ClawHub moderation changes.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install biomed-dataset-finder
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /biomed-dataset-finder 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: search NCBI GEO/SRA, NGDC-GSA, CNGB for biomedical datasets by disease/treatment/species/subtype/data type. Returns bold dataset ID + article info in structured table.
元数据
Slug biomed-dataset-finder
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Biomed Dataset Finder 是什么?

Search NCBI GEO/SRA, NGDC-GSA, and CNGB for biomedical datasets by disease, treatment, species, pathology subtype, and data type. Returns bold dataset ID, li... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 53 次。

如何安装 Biomed Dataset Finder?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install biomed-dataset-finder」即可一键安装,无需额外配置。

Biomed Dataset Finder 是免费的吗?

是的,Biomed Dataset Finder 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Biomed Dataset Finder 支持哪些平台?

Biomed Dataset Finder 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Biomed Dataset Finder?

由 Shuhuan Cao(@gateswell)开发并维护,当前版本 v1.0.0。

💬 留言讨论