Description

Cross-domain scientific discovery through structured extraction of scientific publications. What one paper solves, another needs — this skill extracts provid...

README (SKILL.md)

\r \r

Discovery Engine — Paper Extraction Skill\r

Name: Discovery Engine
Author: pcdeni

\r

Why This Exists\r

\r A nanofiltration paper provides selective passage below a size threshold.\r A drug delivery paper requires selective transport to a target.\r Neither team knows the other exists — they publish in different journals, use different vocabulary, and will never cite each other.\r \r This skill extracts structured provides/requires relationships from scientific papers, building a knowledge graph that surfaces these hidden cross-domain connections. Each paper you extract adds a node. The graph finds the bridges.\r \r

When to Use\r

\r | Trigger | Action |\r |---------|--------|\r | /discovery-extract | Discover papers, extract, and save results |\r | "Extract some papers" | Run the full pipeline (discover → extract → save) |\r | "Submit my extractions" | Create a PR with your batch results |\r | "Find papers from arXiv" | Discover from a specific source |\r \r

Core Concept\r

\r Every paper is decomposed into:\r

Part A (Facts): entities, properties, relations — what the paper reports\r
Part B (Cross-domain): core friction, mechanism, bridge tags, provides/requires interface, unsolved tensions — what connects it to other fields\r \r The cross_domain section is where discovery happens. The provides and requires fields use abstract functional language (not domain jargon) so a materials science paper can match a biology paper.\r \r

You Are the Extractor\r

\r No external API keys or LLM calls needed — you read the paper text and produce the structured JSON yourself. The bundled prompt (references/prompt.txt) is your extraction specification.\r \r

How It Works\r

\r

Run python scripts/extract.py discover to find new papers with abstracts\r
Read references/prompt.txt — the full extraction format specification\r
For each paper: read its abstract and produce the extraction JSON following the prompt\r
Save each result via python scripts/extract.py save\r
Optionally submit results as a PR via gh\r \r

Step 1: Discover Papers\r

\r

python scripts/extract.py discover --count 5\r
```\r
\r
This outputs a JSON array of papers (id, source, title, abstract) to stdout.\r
Already-processed papers are automatically excluded.\r
\r
To target a specific source:\r
```bash\r
python scripts/extract.py discover --source arxiv --count 5\r
python scripts/extract.py discover --source pmc --count 5\r
```\r
\r
## Step 2: Read the Extraction Prompt\r
\r
Read `references/prompt.txt` to understand the output format. It specifies:\r
- **Part A (Facts)**: entities, properties, relations\r
- **Part B (Cross-domain)**: core_friction, mechanism, bridge_tags, provides/requires interface, unsolved_tensions\r
\r
The prompt contains detailed rules, examples, and a self-check procedure.\r
\r
## Step 3: Extract\r
\r
For each paper from Step 1, produce a JSON object following the schema in\r
`references/prompt.txt`. The paper's abstract is your input text.\r
\r
Write the JSON to a temporary file (e.g., `/tmp/result.json` or any local path).\r
\r
**Key requirements:**\r
- Output ONLY valid JSON (no markdown wrapping, no commentary)\r
- The top-level key must be `paper_analysis` (not `analysis`)\r
- `unsolved_tensions` entries must be objects with `{tension, constraint_class, why_it_matters, source_quote}`\r
- `provides` entries must be objects with `{operation, description, performance, conditions}`\r
- `requires` entries must be objects with `{operation, description, reason}`\r
- `bridge_tags` must be abstract functional descriptors, not domain nouns\r
- The `cross_domain` section is where discovery happens — invest effort here\r
\r
## Step 4: Save Results\r
\r
```bash\r
python scripts/extract.py save /tmp/result.json \\r
  --paper-id "arxiv:2401.00001" \\r
  --source arxiv \\r
  --title "Paper Title Here"\r
```\r
\r
The save command normalizes format issues, validates, adds metadata, and saves\r
to `~/.discovery/data/batch/`. It will report any validation warnings.\r
\r
## Step 5: Validate (optional)\r
\r
```bash\r
python scripts/extract.py validate ~/.discovery/data/batch/\r
```\r
\r
## Step 6: Submit Results (optional)\r
\r
After extracting a batch, submit results as a PR:\r
\r
```bash\r
# Fork (first time only)\r
gh repo fork pcdeni/discovery-engine --clone=false\r
\r
# Clone your fork\r
gh repo clone pcdeni/discovery-engine discovery-engine-submit\r
cd discovery-engine-submit\r
\r
# Create branch and copy results\r
BRANCH="contrib/$(gh api user --jq .login)/$(date +%Y%m%d-%H%M%S)"\r
git checkout -b "$BRANCH"\r
cp ~/.discovery/data/batch/*.json submissions/\r
git add submissions/\r
git commit -m "Add extraction results"\r
git push -u origin "$BRANCH"\r
\r
# Create PR\r
gh pr create --title "extraction: $(ls submissions/*.json | wc -l) papers" \\r
  --body "Extraction results from discovery-extract skill" \\r
  --repo pcdeni/discovery-engine\r
```\r
\r
GitHub Actions CI validates submissions and auto-merges passing PRs.\r
\r
## Bundled Files\r
\r
| File | Purpose |\r
|------|---------|\r
| `scripts/extract.py` | Paper discovery, normalization, validation, saving (Python stdlib only) |\r
| `references/prompt.txt` | The extraction format specification (444 lines) |\r
| `references/schema.json` | JSON schema for validation |\r

Capability Analysis

Type: OpenClaw Skill Name: discovery-extract Version: 1.0.7 The discovery-extract skill bundle is a legitimate tool designed for scientific data extraction and cross-domain knowledge graph construction. The Python script `scripts/extract.py` fetches paper metadata and abstracts from reputable public APIs (arXiv, PubMed Central, OpenAlex, and OSTI) and manages local storage in `~/.discovery/`. The bundle includes a highly detailed extraction prompt (`references/prompt.txt`) that specifically instructs the agent to remain in an extraction-only mode, serving as a safeguard against orchestration-based prompt injection. While the skill utilizes the GitHub CLI (`gh`) to submit results, this behavior is transparently documented as a contribution mechanism for the open-source project hosted at `github.com/pcdeni/discovery-engine`.

Version History

v1.0.7

Version 1.1.0 — Major documentation revamp and metadata update - Revamped SKILL.md with a clear introduction, use-cases table, and improved explanations for extraction workflow and core concepts. - Added version and homepage metadata to the skill config. - Clarified the role of "provides/requires" relationships and the cross-domain discovery logic. - Updated metadata with an emoji and better structure; removed user-invocable and legacy keys. - No core logic/code changes; documentation only.

v1.0.6

- Updated description to emphasize cross-domain discovery and extraction of provides/requires relationships. - Clarified the purpose: surfacing hidden connections between fields via structured scientific extraction. - No functional or file changes; documentation only.

v1.0.5

- No user-visible changes in this release. - No file or documentation updates detected.

v1.0.4

- Major update: The skill no longer requires an external LLM or API keys—you now read the paper abstract and produce the extraction JSON directly. - Simplified workflow: Discovery, extraction, and submission steps are streamlined for manual operation. - Updated instructions: Step-by-step usage now highlights reading the prompt, producing valid JSON, running save/validate, and submitting results. - Removal of provider/API-key options; focus shifted to user-driven extraction and manual JSON generation. - Input validation and schema checking are handled via the provided script and prompt guidance.

v1.0.3

Version 1.0.3 — No code or documentation changes detected. - No file changes in this release. - Functionality and documentation remain unchanged from the previous version.

v1.0.2

- Major update: Everything now runs through a fully self-contained Python script with no external dependencies. - Added scripts/extract.py to handle paper discovery, extraction, validation, and saving results. - Bundled extraction prompt (references/prompt.txt) and schema (references/schema.json) for local/offline use. - Updated instructions for running and submitting results—no pip install needed, just run the script. - Simplified prerequisites and usage flow: Python 3.10+, GitHub CLI, and an LLM provider.

v1.0.1

- Updated environment requirements: GitHub CLI (gh) is now required; API keys are optional and support multiple provider types. - Improved setup instructions, clarifying cloud vs local LLM configuration and GitHub CLI authentication. - Submission process now explicitly uses the GitHub CLI; instructions for PR submission and authentication are clearer. - Expanded documentation on supported cloud and local LLM providers. - Minor edits for clarity, conciseness, and up-to-date usage examples.

v1.0.0

Initial release of discovery-extract skill. - Discovers and processes scientific papers from arXiv, PubMed Central, OpenAlex, and OSTI. - Extracts structured scientific knowledge (entities, relations, bridge tags, cross-domain connections) using LLM. - Validates output and auto-submits results as pull requests to the Discovery Engine dataset. - Supports multiple LLM providers, including Anthropic, OpenAI, OpenRouter, Google, and local models. - Fully decentralized operation; avoids duplicate effort via GitHub tracking and CI validation.

Metadata

Slug discovery-extract

Version 1.0.7

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 8

Frequently Asked Questions

What is Discovery Engine?

Cross-domain scientific discovery through structured extraction of scientific publications. What one paper solves, another needs — this skill extracts provid... It is an AI Agent Skill for Claude Code / OpenClaw, with 324 downloads so far.

How do I install Discovery Engine?

Run "/install discovery-extract" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Discovery Engine free?

Yes, Discovery Engine is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Discovery Engine support?

Discovery Engine is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Discovery Engine?

It is built and maintained by pcdeni (@pcdeni); the current version is v1.0.7.

More Skills

Discovery Engine