← 返回 Skills 市场

expflow Pipeline HPO

Name: expflow Pipeline HPO
Author: diamond2nv

作者 diamond2nv · GitHub ↗ · v0.5.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install expflow-pipeline-hpo

功能描述

PDEBench competition workflow orchestration with expflow — three pipeline modes (full/fast/skip), distributed HPO, pruner integration, and ClearML HyperParam...

使用说明 (SKILL.md)

\r \r

expflow PDEBench Pipeline & HPO\r

\r Orchestrate experiment workflows for the AI4S PDE competition using expflow.\r Three modes for three competition phases.\r \r

Triggers\r

User says "run HPO", "submit pipeline", "distributed experiment"\r
User says "competition sprint" or "fast iterate"\r
User asks about automating the train→eval→submit loop\r
User mentions needing to find best hyperparams\r \r

Installation\r

pip install "expflow-pde[pipeline]"\r
```\r
\r
## Available Pipeline Modes\r
\r
Three pipeline modes, each mapped to a CLI command:\r
\r
### Mode A — Full (HPO → Train → Eval)\r
\r
For the **exploration phase** of a competition task. Optuna finds best params\r
via distributed clearml-agent trials, trains with best, then evaluates.\r
\r
```bash\r
expflow pipeline submit-full train_task1.py \\r
    --queue default \\r
    --trials 50 --parallel 4 \\r
    --eval-script eval_task1.py \\r
    --metric seg_total --direction maximize\r
```\r
\r
Flags used:\r
- `--trials N`: total HPO trials\r
- `--parallel M`: max concurrent trials (use GPU node count)\r
- `--metric`: objective metric name prefixed `METRIC:` in script stdout\r
- `--pruner hyperband|median|percentile`: early-stop poor trials\r
- `--study-name`: Optuna study name (auto if omitted; persists to SQLite)\r
- `--skip hpo --skip eval`: run train only within full skeleton\r
\r
### Mode B — Fast (Train → Eval)\r
\r
For the **competition sprint** phase. You already know best params. Skip HPO,\r
run directly with fixed args.\r
\r
```bash\r
expflow pipeline submit train_task1.py \\r
    --queue default \\r
    --train-param lr=0.001 --train-param epochs=80 \\r
    --eval-script eval_task1.py \\r
    --eval-param sub_step=5\r
```\r
\r
Flags:\r
- `--skip eval`: train-only (just submit checkpoint)\r
- `--train-param key=val`: injected as `--key=val` to training script\r
- `--eval-param key=val`: injected as `--key=val` to eval script\r
\r
### Mode C — Flexible Skip\r
\r
Override step inclusion on either mode:\r
\r
```bash\r
expflow pipeline submit-full train_task1.py \\r
    --skip hpo --skip eval          # = train only\r
expflow pipeline submit-full train_task1.py \\r
    --skip train --skip eval         # = HPO only\r
```\r
\r
## HPO: Three Execution Modes\r
\r
HPO (`expflow optuna run`) has three backends:\r
\r
| Mode | Flag | Description | Best for |\r
|------|------|-------------|----------|\r
| Local | (default) | subprocess serial on CPU | ≤20 trials, quick test |\r
| Distributed | `--distributed` | ask/tell + clearml Task clone| Multi-GPU, custom control|\r
| Optimizer | `--optimizer -O` | Clearml `HyperParameterOptimizer` | Production, 50-200+ trials |\r
\r
### Key flags across all HPO modes:\r
- `--pruner hyperband|median|percentile|none`: ASHA pruner saves ~40% GPU time\r
- `--metric \x3Cname>`: reads `METRIC:\x3Cname>=\x3Cvalue>` from script stdout\r
- `--direction maximize|minimize`\r
- `--timeout \x3Cmin>`: safety cutoff\r
\r
## Script Requirements\r
\r
The training/eval script must:\r
1. Accept hyperparams as `--key=value` CLI arguments\r
2. Output `METRIC:\x3Cname>=\x3Cvalue>` to stdout for objective capture (local mode)\r
3. Report clearml scalars for distributed/optimizer mode:\r
   ```python\r
   Task.current_task().report_scalar("Score", "seg_total", value, iteration=epoch)\r
   ```\r
\r
## Pitfalls\r
\r
- **Pruner needs `trial.report()` calls during training.** If the script only reports at the end, the pruner has nothing to prune on. Call `trial.report(val_loss, epoch)` at least every 10 epochs.\r
- **HyperParameterOptimizer needs the metric name in `Title/Series` format.** If your metric is `seg_total`, it becomes `title=seg_total, series=seg_total`. If your clearml report_scalar is `report_scalar("Score", "seg_total", v)`, pass `--metric Score/seg_total`.\r
- **Clearml-agent must be running on GPU nodes** before submitting. Verify with `expflow clearml workers` or check Web UI.\r
- **`_collect_one_trial` polls every 5s** — waits up to 60min per trial. If trials are expected to run longer, increase `timeout_minutes`.\r
\r
## Architecture Reference\r
\r
Key files in `expflow_pde/`:\r
- `hpo.py` — 3-mode HPO runner (local/distributed/optimizer)\r
- `pipeline.py` — ExperimentPipeline class (fast/full modes)\r
- `cli_pipeline.py` — `pipeline submit` + `pipeline submit-full`\r
- `cli_optuna.py` — `optuna run` with all three backends\r
\r
## Related\r
\r
- `experiment-lifecycle-governance` — PIN, metrics registry, compare-scores, competition rules audit\r
- `pde-experiment-hyperparameters` — PDEBench-specific hyperparameter reference\r
- `multi-agent-distributed-experiment-workflow` — Hermes → OpenCode → clearml\r

安全使用建议

Treat this review as incomplete because local artifact reads failed; review metadata.json and the artifact directory before installing.

能力评估

✓ Purpose & Capability

No reviewed artifact evidence showed a purpose-capability mismatch.

✓ Instruction Scope

No reviewed artifact evidence showed hidden or overbroad runtime instructions.

✓ Install Mechanism

No reviewed artifact evidence showed a risky install mechanism.

✓ Credentials

No reviewed artifact evidence showed disproportionate environment access.

✓ Persistence & Privilege

No reviewed artifact evidence showed persistence or privilege abuse.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install expflow-pipeline-hpo
安装完成后，直接呼叫该 Skill 的名称或使用 /expflow-pipeline-hpo 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.5.0

expflow-pipeline-hpo 0.5.0 - Adds robust PDEBench competition pipeline orchestration via expflow with three selectable modes: Full (HPO→Train→Eval), Fast (Train→Eval), and Flexible Skip. - Integrates distributed hyperparameter optimization (HPO) with pruner support and native ClearML HyperParameterOptimizer. - CLI supports custom step skipping, dynamic parameter injection, and three HPO execution backends (local, distributed, optimizer). - Enhances script compatibility requirements, pruner integration details, and error-proofing for distributed workflows. - Documentation updated with detailed usage, CLI flags, and troubleshooting tips for efficient competition submissions.

元数据

Slug expflow-pipeline-hpo

版本 0.5.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题