Description

Predict rice agronomic traits (yield, plant height, heading date, grain size, etc.) from genotype and environmental data using pre-trained MMoE deep learning...

README (SKILL.md)

\r \r

Rice Phenotype Prediction\r

Name: GAIN
Author: qianlvdouhua

\r Self-contained skill for predicting 10 rice agronomic traits via pre-trained MMoE models.\r All models, data, and scripts are inside this directory — give users this one folder.\r \r

Setup\r

\r

First-time check\r

python \x3CSKILL_DIR>/scripts/check_env.py\r
```\r
This verifies Python dependencies and data integrity. If packages are missing:\r
```bash\r
pip install -r \x3CSKILL_DIR>/requirements.txt\r
```\r
\r
Required: `torch>=2.0 numpy pandas scikit-learn scipy requests`\r
GPU is optional — CPU works (just slower). If GPU is present, `cuda:0` is used automatically.\r
\r
### `\x3CSKILL_DIR>` convention\r
\r
Throughout this file, `\x3CSKILL_DIR>` means the absolute path to this skill's root directory\r
(the folder containing this SKILL.md). When running commands, substitute with the actual path.\r
`--base_dir` is optional; if omitted, scripts auto-detect it from their own location.\r
\r
## Supported Traits\r
\r
| Code | Chinese | English | Unit |\r
|------|---------|---------|------|\r
| HD | 抽穗期 | Heading Date | days |\r
| PH | 株高 | Plant Height | cm |\r
| PL | 穗长 | Panicle Length | cm |\r
| TN | 分蘖数 | Tiller Number | count |\r
| GP | 每穗粒数 | Grains Per Panicle | count |\r
| SSR | 结实率 | Seed Setting Rate | % |\r
| TGW | 千粒重 | Thousand Grain Weight | g |\r
| GL | 粒长 | Grain Length | mm |\r
| GW | 粒宽 | Grain Width | mm |\r
| Y | 产量 | Yield | kg/ha |\r
\r
## Supported Locations (7 built-in stations)\r
\r
| Code | City | Lat | Lon |\r
|------|------|-----|-----|\r
| km | 昆明 | 25.02 | 102.68 |\r
| gzl | 六盘水 | 26.59 | 104.83 |\r
| nn | 南宁 | 22.82 | 108.37 |\r
| wh | 武汉 | 30.58 | 114.27 |\r
| hf | 合肥 | 31.82 | 117.25 |\r
| hz | 杭州 | 30.25 | 120.17 |\r
| th | 通化 | 41.73 | 125.94 |\r
\r
Any input lat/lon is auto-matched to the nearest station via Haversine distance.\r
For locations with internet, daily weather data can also be fetched from NASA POWER API for the exact coordinates.\r
\r
## Stress Types\r
\r
| Type | Chinese | Default effect |\r
|------|---------|----------------|\r
| high_temp | 高温胁迫 | +3°C max / +2°C min |\r
| low_temp | 低温胁迫 | -3°C max / -2°C min |\r
| drought | 干旱胁迫 | 90% precipitation reduction |\r
| flood | 涝害胁迫 | 3x precipitation increase |\r
| low_light | 寡照胁迫 | 60% PAR reduction |\r
\r
## Prediction Commands\r
\r
### Full prediction (recommended)\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1\r
```\r
\r
### Genotype-only / environment-only\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --mode gene\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --mode env\r
```\r
\r
### Specific traits\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --trait PH,Y\r
```\r
\r
### With stress\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --stress high_temp\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --stress high_temp --stress_delta 5.0\r
```\r
\r
### Multiple samples\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample "sample1,sample2,sample3"\r
```\r
\r
### Custom genotype file\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --genotype_file /path/to/user_vae.csv\r
```\r
Format: CSV with 1024 columns (VAE-encoded features), first column = sample index.\r
\r
### Force CPU / specific device\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --device cpu\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --device cuda:0\r
```\r
\r
### Human-readable table\r
```bash\r
python \x3CSKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --output table\r
```\r
\r
### All CLI arguments\r
| Arg | Default | Description |\r
|-----|---------|-------------|\r
| `--lat` | required | Latitude |\r
| `--lon` | required | Longitude |\r
| `--sample` | None | Built-in sample ID(s), comma-separated (sample1..sample3925) |\r
| `--genotype_file` | None | Custom 1024-dim VAE CSV path |\r
| `--mode` | full | `gene`, `env`, or `full` |\r
| `--trait` | all | Comma-separated trait codes or `all` |\r
| `--stress` | None | Stress type name |\r
| `--stress_delta` | None | Override temperature delta |\r
| `--device` | auto | `auto`, `cpu`, or `cuda:0` |\r
| `--year` | 2024 | Year for environmental data |\r
| `--output` | json | `json` or `table` |\r
| `--base_dir` | auto | Override skill directory path |\r
\r
## Handling User Requests\r
\r
### 1. Extract location\r
- "经纬度30.5, 114.3" → `--lat 30.5 --lon 114.3`\r
- "武汉" → `--lat 30.58 --lon 114.27`\r
- "北纬25度，东经103度" → `--lat 25 --lon 103`\r
\r
### 2. Map trait names\r
- 株高/plant height → PH\r
- 产量/yield → Y\r
- 粒长/grain length → GL\r
- 抽穗期/heading date → HD\r
- 千粒重/1000-grain weight → TGW\r
- 穗长/panicle length → PL\r
- 结实率/seed setting rate → SSR\r
- 每穗粒数/grains per panicle → GP\r
- 粒宽/grain width → GW\r
- 分蘖数/tiller number → TN\r
\r
### 3. Map stress requests\r
- 高温/heat → high_temp\r
- 低温/cold/chilling → low_temp\r
- 干旱/drought → drought\r
- 洪涝/flooding → flood\r
- 阴天/寡照/low light → low_light\r
- "高温+5度" → `--stress high_temp --stress_delta 5.0`\r
\r
### 4. Genotype data\r
- Built-in samples: `--sample sample1` (3925 available: sample1..sample3925)\r
- User file: `--genotype_file /path/to/file.csv`\r
\r
### 5. Interpreting output\r
JSON contains: `location`, `genotype_prediction`, `environment_prediction`, `stress_prediction`, `trait_info`.\r
\r
Report `environment_prediction` as primary (has environmental context).\r
Compare `genotype_prediction` as baseline.\r
For stress, compare normal vs stressed values.\r
\r
Rounding: HD/TN/GP → integer, PH/PL/TGW/SSR → 1 decimal, GL/GW → 2 decimals, Y → integer.\r
\r
## Directory Structure\r
```\r
rice_prediction/                   ← give users this folder\r
├── SKILL.md                       ← this file\r
├── requirements.txt               ← pip dependencies\r
├── data/\r
│   ├── grid_points.json           ← 7 station coordinates\r
│   ├── vae_features.csv           ← 3925 built-in genotype samples (1024-dim VAE)\r
│   ├── season_history.csv         ← historical season data for normalization\r
│   ├── env_cache/                 ← cached daily weather (auto-populated)\r
│   ├── models_env/                ← 10 trait-specific env+gene models (~4.6MB each)\r
│   └── models_gene/               ← 7 location-specific genotype models (~8MB each)\r
└── scripts/\r
    ├── predict.py                 ← main entry point\r
    ├── check_env.py               ← dependency checker\r
    ├── model_def.py               ← MMoE model architectures\r
    ├── grid_manager.py            ← nearest grid point finder\r
    ├── env_data_fetcher.py        ← NASA POWER API fetcher + cache\r
    ├── env_processor.py           ← environmental feature engineering\r
    └── stress_simulator.py        ← stress scenario simulation\r
```\r
\r
## Architecture (for reference)\r
- **Model**: Multi-gate Mixture-of-Experts (MMoE) with ResidualMLP experts\r
- **Genotype features**: 1024-dim VAE latent encoding of genomic data\r
- **Environment features**: 53 season-aggregated variables from daily weather\r
- **Environmental data**: NASA POWER API (auto-fetched and cached locally)\r

Usage Guidance

This skill appears to do what it says: predict rice traits from genotype and environment. Before installing/use, consider: (1) Model checkpoints (.pt) are loaded with torch.load — ensure any .pt files packaged with or later added to this skill come from a trusted source because malicious checkpoints can execute code when unpickled. (2) The skill will call NASA POWER (power.larc.nasa.gov) if internet is available and will cache responses under data/env_cache in the skill folder; if you need to limit network access, run offline or provide local CSVs. (3) The included check_env.py references model files under data/models_* which are not listed in the manifest you provided — prediction will fail unless model checkpoints are present; verify where the model files originate. (4) Run the provided check_env.py in a sandbox or review the model files before running predict.py; inspect large .pt files or obtain them from the original author/repository. (5) If you allow users to pass a genotype_file path, be careful that the CLI will read that file; do not point it at sensitive system files. If you want higher assurance, request the upstream model artifacts or their provenance and validate them (or run inference in an isolated environment).

Capability Assessment

✓ Purpose & Capability

Name/description (predict rice traits from genotype+environment) matches the provided files (VAE features, env caches, grid points, env processing, model definitions, prediction CLI). Required binaries/env are minimal and consistent with the stated purpose.

ℹ Instruction Scope

SKILL.md and scripts instruct the agent to load local data, optionally fetch weather from NASA POWER (https://power.larc.nasa.gov), cache responses under data/env_cache, process environmental features, and load model checkpoints to produce predictions. These actions are within scope, but the code uses torch.load() to load .pt checkpoints (model files), which can execute arbitrary code during unpickling if checkpoints are untrusted—this is expected for model-based skills but is a security consideration.

ℹ Install Mechanism

No install spec is declared (instruction-only), but the skill package contains code and data files. The skill relies on pip-installable Python packages listed in requirements.txt; there are no downloads from untrusted URLs or URL shorteners. The runtime will write cached NASA POWER CSVs into the skill's data/env_cache directory.

✓ Credentials

The skill does not request environment variables, secrets, or unrelated credentials. Network access is used only to call the NASA POWER API (documented). File access is limited to files within the skill directory and any user-specified genotype CSV; this is proportionate to the stated functionality.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system settings. It will create cache files under its own data/env_cache directory to store fetched weather data, which is expected behaviour.

Version History

v1.0.1

Add README.md with full usage guide in Chinese and English

v1.0.0

Initial release: Rice phenotype prediction skill using MMoE deep learning models

Metadata

Slug gain

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is GAIN?

Predict rice agronomic traits (yield, plant height, heading date, grain size, etc.) from genotype and environmental data using pre-trained MMoE deep learning... It is an AI Agent Skill for Claude Code / OpenClaw, with 109 downloads so far.

How do I install GAIN?

Run "/install gain" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GAIN free?

Yes, GAIN is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GAIN support?

GAIN is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GAIN?

It is built and maintained by Qianlvdouhua (@qianlvdouhua); the current version is v1.0.1.

More Skills

GAIN