← 返回 Skills 市场
pcdeni

Discovery Engine

作者 pcdeni · GitHub ↗ · v1.0.7 · MIT-0
cross-platform ✓ 安全检测通过
324
总下载
0
收藏
0
当前安装
8
版本数
在 OpenClaw 中安装
/install discovery-extract
功能描述
Cross-domain scientific discovery through structured extraction of scientific publications. What one paper solves, another needs — this skill extracts provid...
使用说明 (SKILL.md)

\r \r

Discovery Engine — Paper Extraction Skill\r

\r

Why This Exists\r

\r A nanofiltration paper provides selective passage below a size threshold.\r A drug delivery paper requires selective transport to a target.\r Neither team knows the other exists — they publish in different journals, use different vocabulary, and will never cite each other.\r \r This skill extracts structured provides/requires relationships from scientific papers, building a knowledge graph that surfaces these hidden cross-domain connections. Each paper you extract adds a node. The graph finds the bridges.\r \r

When to Use\r

\r | Trigger | Action |\r |---------|--------|\r | /discovery-extract | Discover papers, extract, and save results |\r | "Extract some papers" | Run the full pipeline (discover → extract → save) |\r | "Submit my extractions" | Create a PR with your batch results |\r | "Find papers from arXiv" | Discover from a specific source |\r \r

Core Concept\r

\r Every paper is decomposed into:\r

  • Part A (Facts): entities, properties, relations — what the paper reports\r
  • Part B (Cross-domain): core friction, mechanism, bridge tags, provides/requires interface, unsolved tensions — what connects it to other fields\r \r The cross_domain section is where discovery happens. The provides and requires fields use abstract functional language (not domain jargon) so a materials science paper can match a biology paper.\r \r

You Are the Extractor\r

\r No external API keys or LLM calls needed — you read the paper text and produce the structured JSON yourself. The bundled prompt (references/prompt.txt) is your extraction specification.\r \r

How It Works\r

\r

  1. Run python scripts/extract.py discover to find new papers with abstracts\r
  2. Read references/prompt.txt — the full extraction format specification\r
  3. For each paper: read its abstract and produce the extraction JSON following the prompt\r
  4. Save each result via python scripts/extract.py save\r
  5. Optionally submit results as a PR via gh\r \r

Step 1: Discover Papers\r

\r

python scripts/extract.py discover --count 5\r
```\r
\r
This outputs a JSON array of papers (id, source, title, abstract) to stdout.\r
Already-processed papers are automatically excluded.\r
\r
To target a specific source:\r
```bash\r
python scripts/extract.py discover --source arxiv --count 5\r
python scripts/extract.py discover --source pmc --count 5\r
```\r
\r
## Step 2: Read the Extraction Prompt\r
\r
Read `references/prompt.txt` to understand the output format. It specifies:\r
- **Part A (Facts)**: entities, properties, relations\r
- **Part B (Cross-domain)**: core_friction, mechanism, bridge_tags, provides/requires interface, unsolved_tensions\r
\r
The prompt contains detailed rules, examples, and a self-check procedure.\r
\r
## Step 3: Extract\r
\r
For each paper from Step 1, produce a JSON object following the schema in\r
`references/prompt.txt`. The paper's abstract is your input text.\r
\r
Write the JSON to a temporary file (e.g., `/tmp/result.json` or any local path).\r
\r
**Key requirements:**\r
- Output ONLY valid JSON (no markdown wrapping, no commentary)\r
- The top-level key must be `paper_analysis` (not `analysis`)\r
- `unsolved_tensions` entries must be objects with `{tension, constraint_class, why_it_matters, source_quote}`\r
- `provides` entries must be objects with `{operation, description, performance, conditions}`\r
- `requires` entries must be objects with `{operation, description, reason}`\r
- `bridge_tags` must be abstract functional descriptors, not domain nouns\r
- The `cross_domain` section is where discovery happens — invest effort here\r
\r
## Step 4: Save Results\r
\r
```bash\r
python scripts/extract.py save /tmp/result.json \\r
  --paper-id "arxiv:2401.00001" \\r
  --source arxiv \\r
  --title "Paper Title Here"\r
```\r
\r
The save command normalizes format issues, validates, adds metadata, and saves\r
to `~/.discovery/data/batch/`. It will report any validation warnings.\r
\r
## Step 5: Validate (optional)\r
\r
```bash\r
python scripts/extract.py validate ~/.discovery/data/batch/\r
```\r
\r
## Step 6: Submit Results (optional)\r
\r
After extracting a batch, submit results as a PR:\r
\r
```bash\r
# Fork (first time only)\r
gh repo fork pcdeni/discovery-engine --clone=false\r
\r
# Clone your fork\r
gh repo clone pcdeni/discovery-engine discovery-engine-submit\r
cd discovery-engine-submit\r
\r
# Create branch and copy results\r
BRANCH="contrib/$(gh api user --jq .login)/$(date +%Y%m%d-%H%M%S)"\r
git checkout -b "$BRANCH"\r
cp ~/.discovery/data/batch/*.json submissions/\r
git add submissions/\r
git commit -m "Add extraction results"\r
git push -u origin "$BRANCH"\r
\r
# Create PR\r
gh pr create --title "extraction: $(ls submissions/*.json | wc -l) papers" \\r
  --body "Extraction results from discovery-extract skill" \\r
  --repo pcdeni/discovery-engine\r
```\r
\r
GitHub Actions CI validates submissions and auto-merges passing PRs.\r
\r
## Bundled Files\r
\r
| File | Purpose |\r
|------|---------|\r
| `scripts/extract.py` | Paper discovery, normalization, validation, saving (Python stdlib only) |\r
| `references/prompt.txt` | The extraction format specification (444 lines) |\r
| `references/schema.json` | JSON schema for validation |\r
功能分析
Type: OpenClaw Skill Name: discovery-extract Version: 1.0.7 The discovery-extract skill bundle is a legitimate tool designed for scientific data extraction and cross-domain knowledge graph construction. The Python script `scripts/extract.py` fetches paper metadata and abstracts from reputable public APIs (arXiv, PubMed Central, OpenAlex, and OSTI) and manages local storage in `~/.discovery/`. The bundle includes a highly detailed extraction prompt (`references/prompt.txt`) that specifically instructs the agent to remain in an extraction-only mode, serving as a safeguard against orchestration-based prompt injection. While the skill utilizes the GitHub CLI (`gh`) to submit results, this behavior is transparently documented as a contribution mechanism for the open-source project hosted at `github.com/pcdeni/discovery-engine`.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install discovery-extract
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /discovery-extract 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.7
Version 1.1.0 — Major documentation revamp and metadata update - Revamped SKILL.md with a clear introduction, use-cases table, and improved explanations for extraction workflow and core concepts. - Added version and homepage metadata to the skill config. - Clarified the role of "provides/requires" relationships and the cross-domain discovery logic. - Updated metadata with an emoji and better structure; removed user-invocable and legacy keys. - No core logic/code changes; documentation only.
v1.0.6
- Updated description to emphasize cross-domain discovery and extraction of provides/requires relationships. - Clarified the purpose: surfacing hidden connections between fields via structured scientific extraction. - No functional or file changes; documentation only.
v1.0.5
- No user-visible changes in this release. - No file or documentation updates detected.
v1.0.4
- Major update: The skill no longer requires an external LLM or API keys—you now read the paper abstract and produce the extraction JSON directly. - Simplified workflow: Discovery, extraction, and submission steps are streamlined for manual operation. - Updated instructions: Step-by-step usage now highlights reading the prompt, producing valid JSON, running save/validate, and submitting results. - Removal of provider/API-key options; focus shifted to user-driven extraction and manual JSON generation. - Input validation and schema checking are handled via the provided script and prompt guidance.
v1.0.3
Version 1.0.3 — No code or documentation changes detected. - No file changes in this release. - Functionality and documentation remain unchanged from the previous version.
v1.0.2
- Major update: Everything now runs through a fully self-contained Python script with no external dependencies. - Added scripts/extract.py to handle paper discovery, extraction, validation, and saving results. - Bundled extraction prompt (references/prompt.txt) and schema (references/schema.json) for local/offline use. - Updated instructions for running and submitting results—no pip install needed, just run the script. - Simplified prerequisites and usage flow: Python 3.10+, GitHub CLI, and an LLM provider.
v1.0.1
- Updated environment requirements: GitHub CLI (gh) is now required; API keys are optional and support multiple provider types. - Improved setup instructions, clarifying cloud vs local LLM configuration and GitHub CLI authentication. - Submission process now explicitly uses the GitHub CLI; instructions for PR submission and authentication are clearer. - Expanded documentation on supported cloud and local LLM providers. - Minor edits for clarity, conciseness, and up-to-date usage examples.
v1.0.0
Initial release of discovery-extract skill. - Discovers and processes scientific papers from arXiv, PubMed Central, OpenAlex, and OSTI. - Extracts structured scientific knowledge (entities, relations, bridge tags, cross-domain connections) using LLM. - Validates output and auto-submits results as pull requests to the Discovery Engine dataset. - Supports multiple LLM providers, including Anthropic, OpenAI, OpenRouter, Google, and local models. - Fully decentralized operation; avoids duplicate effort via GitHub tracking and CI validation.
元数据
Slug discovery-extract
版本 1.0.7
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 8
常见问题

Discovery Engine 是什么?

Cross-domain scientific discovery through structured extraction of scientific publications. What one paper solves, another needs — this skill extracts provid... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 324 次。

如何安装 Discovery Engine?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install discovery-extract」即可一键安装,无需额外配置。

Discovery Engine 是免费的吗?

是的,Discovery Engine 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Discovery Engine 支持哪些平台?

Discovery Engine 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Discovery Engine?

由 pcdeni(@pcdeni)开发并维护,当前版本 v1.0.7。

💬 留言讨论