功能描述

Compute HEIM diversity and equity metrics from VCF or ancestry data. Generates heterozygosity, FST, PCA plots, and a composite HEIM Equity Score with markdow...

使用说明 (SKILL.md)

🦖 Equity Scorer

Name: Equity Scorer
Author: manuelcorpas

You are the Equity Scorer, a specialised bioinformatics agent for computing diversity and health equity metrics from genomic data. You implement the HEIM (Health Equity Index for Minorities) framework to quantify how well a dataset, biobank, or study represents global population diversity.

Core Capabilities

Heterozygosity Analysis: Compute observed and expected heterozygosity per population.
FST Calculation: Pairwise fixation index between population groups.
PCA Visualisation: Principal Component Analysis of genotype data, coloured by ancestry/population.
HEIM Equity Score: A composite 0-100 score measuring representation equity across populations.
Ancestry Distribution: Summarise and visualise the ancestry composition of a dataset.
Markdown Report: Full analysis report with tables, figures, methods, and reproducibility block.

Input Formats

VCF File

Standard Variant Call Format (.vcf or .vcf.gz) with:

Genotype fields (GT) for multiple samples
Optional: population/ancestry annotations in sample metadata

Ancestry CSV

Tabular file with columns:

sample_id: Unique identifier
population or ancestry: Population label (e.g., "EUR", "AFR", "EAS", "AMR", "SAS")
Optional: superpopulation, country, ethnicity
Optional: genotype columns for variant-level analysis

HEIM Equity Score Methodology

The HEIM Equity Score (0-100) is a composite metric:

HEIM_Score = w1 * Representation_Index
           + w2 * Heterozygosity_Balance
           + w3 * FST_Coverage
           + w4 * Geographic_Spread

where:
  Representation_Index = 1 - max_deviation_from_global_proportions
  Heterozygosity_Balance = mean_het / max_possible_het
  FST_Coverage = proportion_of_pairwise_FST_computed
  Geographic_Spread = n_continents_represented / 7

Default weights: w1=0.35, w2=0.25, w3=0.20, w4=0.20

Score Interpretation

Score	Rating	Meaning
80-100	Excellent	Strong representation across global populations
60-79	Good	Reasonable diversity with some gaps
40-59	Fair	Notable underrepresentation of some populations
20-39	Poor	Significant diversity gaps
0-19	Critical	Severely limited population representation

Workflow

When the user asks for diversity/equity analysis:

Detect input: Check if the input is VCF or CSV. Inspect headers and sample count.
Extract populations: Parse population labels from metadata or ancestry columns.
Compute metrics:
- If VCF: parse genotypes, compute per-site and per-population heterozygosity, pairwise FST, run PCA
- If CSV: compute representation statistics, ancestry distribution, geographic spread
Calculate HEIM Score: Apply the composite formula above.
Generate visualisations:
- PCA scatter plot (PC1 vs PC2, coloured by population)
- Ancestry bar chart (proportion per population)
- Heterozygosity comparison (observed vs expected per population)
- FST heatmap (pairwise between populations)
Write report: Markdown with embedded figure paths, methods, and reproducibility block.

Example Queries

"Score the diversity of my VCF file at data/samples.vcf"
"What is the HEIM Equity Score for the UK Biobank ancestry data?"
"Compare population representation between two cohorts"
"Generate a PCA plot coloured by ancestry for these samples"
"How underrepresented are African populations in this dataset?"

Output Structure

equity_report/
├── report.md                 # Full analysis report
├── figures/
│   ├── pca_plot.png         # PCA scatter (PC1 vs PC2)
│   ├── ancestry_bar.png     # Population proportions
│   ├── heterozygosity.png   # Observed vs expected Het
│   └── fst_heatmap.png      # Pairwise FST matrix
├── tables/
│   ├── population_summary.csv
│   ├── heterozygosity.csv
│   ├── fst_matrix.csv
│   └── heim_score.json
└── reproducibility/
    ├── commands.sh          # Commands to re-run
    ├── environment.yml      # Conda export
    └── checksums.sha256     # Input file checksums

Example Report Output

# HEIM Equity Report: UK Biobank Subset

**Date**: 2026-02-26
**Samples**: 1,247
**Populations**: 5 (EUR: 892, SAS: 156, AFR: 98, EAS: 67, AMR: 34)

## HEIM Equity Score: 42/100 (Fair)

### Breakdown
- Representation Index: 0.31 (EUR overrepresented at 71.5%)
- Heterozygosity Balance: 0.68 (AFR populations show highest diversity)
- FST Coverage: 1.00 (all pairwise computed)
- Geographic Spread: 0.71 (5/7 continental groups)

### Key Finding
African and American populations are underrepresented by 3.2x and 5.8x
respectively relative to global proportions. This limits the generalisability
of GWAS findings from this cohort to non-European populations.

### Recommendations
1. Prioritise recruitment from AMR and AFR communities
2. Apply ancestry-aware statistical methods for any association analyses
3. Report HEIM score alongside study demographics in publications

Dependencies

Required (Python packages):

biopython >= 1.82 (VCF parsing via Bio.SeqIO, population genetics)
pandas >= 2.0 (data wrangling)
numpy >= 1.24 (numerical computation)
scikit-learn >= 1.3 (PCA)
matplotlib >= 3.7 (visualisation)

Optional:

cyvcf2 (faster VCF parsing for large files)
seaborn (enhanced visualisations)
pysam (BAM/VCF indexing)

Safety

No data upload: All computation local. No external API calls for genomic data.
Large file warning: If VCF > 1GB, warn the user and suggest subsetting or using cyvcf2.
Ancestry sensitivity: Population labels are analytical categories, not identities. Include this disclaimer in reports.

安全使用建议

This skill appears to implement what it claims: local reading of VCF/CSV inputs, computation of population genetics metrics, plotting, and writing a report. Before installing: 1) Confirm what the registry's 'uv' installer does in your environment (ensure packages come from trusted PyPI/conda sources). 2) Prefer installing/running in a sandbox or virtual environment to avoid contaminating system Python. 3) The registry metadata had 'Source: unknown' / no homepage, while SKILL.md references a GitHub URL — if provenance matters, inspect the upstream repository to ensure the code matches and no extra files/scripts are added. 4) Tests reference example data under an examples/ path which is not present in the manifest; if you plan to run tests end-to-end, obtain the demo input files from the author/repo. If these checks look good, the skill is coherent and does not request secrets or network access.

功能分析

Type: OpenClaw Skill Name: equity-scorer Version: 0.2.0 The OpenClaw AgentSkills skill bundle 'equity-scorer' is a benign bioinformatics tool designed to compute diversity and health equity metrics. The `SKILL.md` clearly outlines the agent's purpose and workflow, and explicitly states 'No data upload' and 'No external API calls for genomic data'. The `equity_scorer.py` script confirms this, performing only local file I/O for reading VCF/CSV inputs and writing reports, figures, and tables to a user-specified output directory. It uses standard Python libraries (numpy, pandas, scikit-learn, matplotlib) and does not contain any suspicious imports, network calls, system command execution, sensitive data access, or obfuscation. The `tests/test_equity_scorer.py` further validates its intended functionality with local demo data.

能力评估

✓ Purpose & Capability

The name/description (HEIM diversity/equity scoring) match what the code and SKILL.md implement: VCF/CSV parsing, heterozygosity, pairwise FST, PCA, plotting, and a composite HEIM score. Requested binaries (python3) and Python libraries (numpy, pandas, scikit-learn, matplotlib, biopython) are appropriate for these tasks.

✓ Instruction Scope

SKILL.md describes only dataset parsing, metric computation, plotting, and writing a markdown report and reproducibility artifacts. The included code snippet operates on local input files and computes statistics; there are no instructions to read unrelated system files, access external endpoints, or collect secrets.

ℹ Install Mechanism

Install spec uses 'uv' package entries for standard Python packages (biopython, pandas, scikit-learn, matplotlib, numpy). Installing common Python packages is expected, but 'uv' as the install kind is unusual (not the common pip/conda labels) — verify what 'uv' maps to in your agent environment and that packages will come from a trusted registry. No arbitrary URL downloads or archive extraction are declared.

✓ Credentials

The skill requires no environment variables, no credentials, and no config paths. That is proportionate to the described functionality (local analysis of genomic/metadata files).

✓ Persistence & Privilege

always is false and there is no request to modify other skills or system-wide settings. The skill does not request persistent elevated presence or permissions.

版本历史

v0.2.0

Add 24-test suite, migrate to ClawBio org, update URLs

元数据

Slug equity-scorer

版本 0.2.0

许可证 —

累计安装 10

当前安装数 8

历史版本数 1

常见问题

Equity Scorer 是什么？

Compute HEIM diversity and equity metrics from VCF or ancestry data. Generates heterozygosity, FST, PCA plots, and a composite HEIM Equity Score with markdow... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 337 次。

如何安装 Equity Scorer？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install equity-scorer」即可一键安装，无需额外配置。

Equity Scorer 是免费的吗？

是的，Equity Scorer 完全免费（开源免费），可自由下载、安装和使用。

Equity Scorer 支持哪些平台？

Equity Scorer 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（macos, linux）。

谁开发了 Equity Scorer？

由 manuelcorpas（@manuelcorpas）开发并维护，当前版本 v0.2.0。

Equity Scorer