/install discovery-extract
\r \r
Discovery Engine — Paper Extraction Skill\r
\r
Why This Exists\r
\r A nanofiltration paper provides selective passage below a size threshold.\r A drug delivery paper requires selective transport to a target.\r Neither team knows the other exists — they publish in different journals, use different vocabulary, and will never cite each other.\r \r This skill extracts structured provides/requires relationships from scientific papers, building a knowledge graph that surfaces these hidden cross-domain connections. Each paper you extract adds a node. The graph finds the bridges.\r \r
When to Use\r
\r
| Trigger | Action |\r
|---------|--------|\r
| /discovery-extract | Discover papers, extract, and save results |\r
| "Extract some papers" | Run the full pipeline (discover → extract → save) |\r
| "Submit my extractions" | Create a PR with your batch results |\r
| "Find papers from arXiv" | Discover from a specific source |\r
\r
Core Concept\r
\r Every paper is decomposed into:\r
- Part A (Facts): entities, properties, relations — what the paper reports\r
- Part B (Cross-domain): core friction, mechanism, bridge tags, provides/requires interface, unsolved tensions — what connects it to other fields\r
\r
The
cross_domainsection is where discovery happens. Theprovidesandrequiresfields use abstract functional language (not domain jargon) so a materials science paper can match a biology paper.\r \r
You Are the Extractor\r
\r
No external API keys or LLM calls needed — you read the paper text and produce the structured JSON yourself. The bundled prompt (references/prompt.txt) is your extraction specification.\r
\r
How It Works\r
\r
- Run
python scripts/extract.py discoverto find new papers with abstracts\r - Read
references/prompt.txt— the full extraction format specification\r - For each paper: read its abstract and produce the extraction JSON following the prompt\r
- Save each result via
python scripts/extract.py save\r - Optionally submit results as a PR via
gh\r \r
Step 1: Discover Papers\r
\r
python scripts/extract.py discover --count 5\r
```\r
\r
This outputs a JSON array of papers (id, source, title, abstract) to stdout.\r
Already-processed papers are automatically excluded.\r
\r
To target a specific source:\r
```bash\r
python scripts/extract.py discover --source arxiv --count 5\r
python scripts/extract.py discover --source pmc --count 5\r
```\r
\r
## Step 2: Read the Extraction Prompt\r
\r
Read `references/prompt.txt` to understand the output format. It specifies:\r
- **Part A (Facts)**: entities, properties, relations\r
- **Part B (Cross-domain)**: core_friction, mechanism, bridge_tags, provides/requires interface, unsolved_tensions\r
\r
The prompt contains detailed rules, examples, and a self-check procedure.\r
\r
## Step 3: Extract\r
\r
For each paper from Step 1, produce a JSON object following the schema in\r
`references/prompt.txt`. The paper's abstract is your input text.\r
\r
Write the JSON to a temporary file (e.g., `/tmp/result.json` or any local path).\r
\r
**Key requirements:**\r
- Output ONLY valid JSON (no markdown wrapping, no commentary)\r
- The top-level key must be `paper_analysis` (not `analysis`)\r
- `unsolved_tensions` entries must be objects with `{tension, constraint_class, why_it_matters, source_quote}`\r
- `provides` entries must be objects with `{operation, description, performance, conditions}`\r
- `requires` entries must be objects with `{operation, description, reason}`\r
- `bridge_tags` must be abstract functional descriptors, not domain nouns\r
- The `cross_domain` section is where discovery happens — invest effort here\r
\r
## Step 4: Save Results\r
\r
```bash\r
python scripts/extract.py save /tmp/result.json \\r
--paper-id "arxiv:2401.00001" \\r
--source arxiv \\r
--title "Paper Title Here"\r
```\r
\r
The save command normalizes format issues, validates, adds metadata, and saves\r
to `~/.discovery/data/batch/`. It will report any validation warnings.\r
\r
## Step 5: Validate (optional)\r
\r
```bash\r
python scripts/extract.py validate ~/.discovery/data/batch/\r
```\r
\r
## Step 6: Submit Results (optional)\r
\r
After extracting a batch, submit results as a PR:\r
\r
```bash\r
# Fork (first time only)\r
gh repo fork pcdeni/discovery-engine --clone=false\r
\r
# Clone your fork\r
gh repo clone pcdeni/discovery-engine discovery-engine-submit\r
cd discovery-engine-submit\r
\r
# Create branch and copy results\r
BRANCH="contrib/$(gh api user --jq .login)/$(date +%Y%m%d-%H%M%S)"\r
git checkout -b "$BRANCH"\r
cp ~/.discovery/data/batch/*.json submissions/\r
git add submissions/\r
git commit -m "Add extraction results"\r
git push -u origin "$BRANCH"\r
\r
# Create PR\r
gh pr create --title "extraction: $(ls submissions/*.json | wc -l) papers" \\r
--body "Extraction results from discovery-extract skill" \\r
--repo pcdeni/discovery-engine\r
```\r
\r
GitHub Actions CI validates submissions and auto-merges passing PRs.\r
\r
## Bundled Files\r
\r
| File | Purpose |\r
|------|---------|\r
| `scripts/extract.py` | Paper discovery, normalization, validation, saving (Python stdlib only) |\r
| `references/prompt.txt` | The extraction format specification (444 lines) |\r
| `references/schema.json` | JSON schema for validation |\r
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install discovery-extract - After installation, invoke the skill by name or use
/discovery-extract - Provide required inputs per the skill's parameter spec and get structured output
What is Discovery Engine?
Cross-domain scientific discovery through structured extraction of scientific publications. What one paper solves, another needs — this skill extracts provid... It is an AI Agent Skill for Claude Code / OpenClaw, with 324 downloads so far.
How do I install Discovery Engine?
Run "/install discovery-extract" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Discovery Engine free?
Yes, Discovery Engine is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Discovery Engine support?
Discovery Engine is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Discovery Engine?
It is built and maintained by pcdeni (@pcdeni); the current version is v1.0.7.