/install discovery-extract
\r \r
Discovery Engine — Paper Extraction Skill\r
\r
Why This Exists\r
\r A nanofiltration paper provides selective passage below a size threshold.\r A drug delivery paper requires selective transport to a target.\r Neither team knows the other exists — they publish in different journals, use different vocabulary, and will never cite each other.\r \r This skill extracts structured provides/requires relationships from scientific papers, building a knowledge graph that surfaces these hidden cross-domain connections. Each paper you extract adds a node. The graph finds the bridges.\r \r
When to Use\r
\r
| Trigger | Action |\r
|---------|--------|\r
| /discovery-extract | Discover papers, extract, and save results |\r
| "Extract some papers" | Run the full pipeline (discover → extract → save) |\r
| "Submit my extractions" | Create a PR with your batch results |\r
| "Find papers from arXiv" | Discover from a specific source |\r
\r
Core Concept\r
\r Every paper is decomposed into:\r
- Part A (Facts): entities, properties, relations — what the paper reports\r
- Part B (Cross-domain): core friction, mechanism, bridge tags, provides/requires interface, unsolved tensions — what connects it to other fields\r
\r
The
cross_domainsection is where discovery happens. Theprovidesandrequiresfields use abstract functional language (not domain jargon) so a materials science paper can match a biology paper.\r \r
You Are the Extractor\r
\r
No external API keys or LLM calls needed — you read the paper text and produce the structured JSON yourself. The bundled prompt (references/prompt.txt) is your extraction specification.\r
\r
How It Works\r
\r
- Run
python scripts/extract.py discoverto find new papers with abstracts\r - Read
references/prompt.txt— the full extraction format specification\r - For each paper: read its abstract and produce the extraction JSON following the prompt\r
- Save each result via
python scripts/extract.py save\r - Optionally submit results as a PR via
gh\r \r
Step 1: Discover Papers\r
\r
python scripts/extract.py discover --count 5\r
```\r
\r
This outputs a JSON array of papers (id, source, title, abstract) to stdout.\r
Already-processed papers are automatically excluded.\r
\r
To target a specific source:\r
```bash\r
python scripts/extract.py discover --source arxiv --count 5\r
python scripts/extract.py discover --source pmc --count 5\r
```\r
\r
## Step 2: Read the Extraction Prompt\r
\r
Read `references/prompt.txt` to understand the output format. It specifies:\r
- **Part A (Facts)**: entities, properties, relations\r
- **Part B (Cross-domain)**: core_friction, mechanism, bridge_tags, provides/requires interface, unsolved_tensions\r
\r
The prompt contains detailed rules, examples, and a self-check procedure.\r
\r
## Step 3: Extract\r
\r
For each paper from Step 1, produce a JSON object following the schema in\r
`references/prompt.txt`. The paper's abstract is your input text.\r
\r
Write the JSON to a temporary file (e.g., `/tmp/result.json` or any local path).\r
\r
**Key requirements:**\r
- Output ONLY valid JSON (no markdown wrapping, no commentary)\r
- The top-level key must be `paper_analysis` (not `analysis`)\r
- `unsolved_tensions` entries must be objects with `{tension, constraint_class, why_it_matters, source_quote}`\r
- `provides` entries must be objects with `{operation, description, performance, conditions}`\r
- `requires` entries must be objects with `{operation, description, reason}`\r
- `bridge_tags` must be abstract functional descriptors, not domain nouns\r
- The `cross_domain` section is where discovery happens — invest effort here\r
\r
## Step 4: Save Results\r
\r
```bash\r
python scripts/extract.py save /tmp/result.json \\r
--paper-id "arxiv:2401.00001" \\r
--source arxiv \\r
--title "Paper Title Here"\r
```\r
\r
The save command normalizes format issues, validates, adds metadata, and saves\r
to `~/.discovery/data/batch/`. It will report any validation warnings.\r
\r
## Step 5: Validate (optional)\r
\r
```bash\r
python scripts/extract.py validate ~/.discovery/data/batch/\r
```\r
\r
## Step 6: Submit Results (optional)\r
\r
After extracting a batch, submit results as a PR:\r
\r
```bash\r
# Fork (first time only)\r
gh repo fork pcdeni/discovery-engine --clone=false\r
\r
# Clone your fork\r
gh repo clone pcdeni/discovery-engine discovery-engine-submit\r
cd discovery-engine-submit\r
\r
# Create branch and copy results\r
BRANCH="contrib/$(gh api user --jq .login)/$(date +%Y%m%d-%H%M%S)"\r
git checkout -b "$BRANCH"\r
cp ~/.discovery/data/batch/*.json submissions/\r
git add submissions/\r
git commit -m "Add extraction results"\r
git push -u origin "$BRANCH"\r
\r
# Create PR\r
gh pr create --title "extraction: $(ls submissions/*.json | wc -l) papers" \\r
--body "Extraction results from discovery-extract skill" \\r
--repo pcdeni/discovery-engine\r
```\r
\r
GitHub Actions CI validates submissions and auto-merges passing PRs.\r
\r
## Bundled Files\r
\r
| File | Purpose |\r
|------|---------|\r
| `scripts/extract.py` | Paper discovery, normalization, validation, saving (Python stdlib only) |\r
| `references/prompt.txt` | The extraction format specification (444 lines) |\r
| `references/schema.json` | JSON schema for validation |\r
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install discovery-extract - 安装完成后,直接呼叫该 Skill 的名称或使用
/discovery-extract触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Discovery Engine 是什么?
Cross-domain scientific discovery through structured extraction of scientific publications. What one paper solves, another needs — this skill extracts provid... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 324 次。
如何安装 Discovery Engine?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install discovery-extract」即可一键安装,无需额外配置。
Discovery Engine 是免费的吗?
是的,Discovery Engine 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Discovery Engine 支持哪些平台?
Discovery Engine 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Discovery Engine?
由 pcdeni(@pcdeni)开发并维护,当前版本 v1.0.7。