expflow Pipeline HPO
/install expflow-pipeline-hpo
\r \r
expflow PDEBench Pipeline & HPO\r
\r Orchestrate experiment workflows for the AI4S PDE competition using expflow.\r Three modes for three competition phases.\r \r
Triggers\r
\r
- User says "run HPO", "submit pipeline", "distributed experiment"\r
- User says "competition sprint" or "fast iterate"\r
- User asks about automating the train→eval→submit loop\r
- User mentions needing to find best hyperparams\r \r
Installation\r
\r
pip install "expflow-pde[pipeline]"\r
```\r
\r
## Available Pipeline Modes\r
\r
Three pipeline modes, each mapped to a CLI command:\r
\r
### Mode A — Full (HPO → Train → Eval)\r
\r
For the **exploration phase** of a competition task. Optuna finds best params\r
via distributed clearml-agent trials, trains with best, then evaluates.\r
\r
```bash\r
expflow pipeline submit-full train_task1.py \\r
--queue default \\r
--trials 50 --parallel 4 \\r
--eval-script eval_task1.py \\r
--metric seg_total --direction maximize\r
```\r
\r
Flags used:\r
- `--trials N`: total HPO trials\r
- `--parallel M`: max concurrent trials (use GPU node count)\r
- `--metric`: objective metric name prefixed `METRIC:` in script stdout\r
- `--pruner hyperband|median|percentile`: early-stop poor trials\r
- `--study-name`: Optuna study name (auto if omitted; persists to SQLite)\r
- `--skip hpo --skip eval`: run train only within full skeleton\r
\r
### Mode B — Fast (Train → Eval)\r
\r
For the **competition sprint** phase. You already know best params. Skip HPO,\r
run directly with fixed args.\r
\r
```bash\r
expflow pipeline submit train_task1.py \\r
--queue default \\r
--train-param lr=0.001 --train-param epochs=80 \\r
--eval-script eval_task1.py \\r
--eval-param sub_step=5\r
```\r
\r
Flags:\r
- `--skip eval`: train-only (just submit checkpoint)\r
- `--train-param key=val`: injected as `--key=val` to training script\r
- `--eval-param key=val`: injected as `--key=val` to eval script\r
\r
### Mode C — Flexible Skip\r
\r
Override step inclusion on either mode:\r
\r
```bash\r
expflow pipeline submit-full train_task1.py \\r
--skip hpo --skip eval # = train only\r
expflow pipeline submit-full train_task1.py \\r
--skip train --skip eval # = HPO only\r
```\r
\r
## HPO: Three Execution Modes\r
\r
HPO (`expflow optuna run`) has three backends:\r
\r
| Mode | Flag | Description | Best for |\r
|------|------|-------------|----------|\r
| Local | (default) | subprocess serial on CPU | ≤20 trials, quick test |\r
| Distributed | `--distributed` | ask/tell + clearml Task clone| Multi-GPU, custom control|\r
| Optimizer | `--optimizer -O` | Clearml `HyperParameterOptimizer` | Production, 50-200+ trials |\r
\r
### Key flags across all HPO modes:\r
- `--pruner hyperband|median|percentile|none`: ASHA pruner saves ~40% GPU time\r
- `--metric \x3Cname>`: reads `METRIC:\x3Cname>=\x3Cvalue>` from script stdout\r
- `--direction maximize|minimize`\r
- `--timeout \x3Cmin>`: safety cutoff\r
\r
## Script Requirements\r
\r
The training/eval script must:\r
1. Accept hyperparams as `--key=value` CLI arguments\r
2. Output `METRIC:\x3Cname>=\x3Cvalue>` to stdout for objective capture (local mode)\r
3. Report clearml scalars for distributed/optimizer mode:\r
```python\r
Task.current_task().report_scalar("Score", "seg_total", value, iteration=epoch)\r
```\r
\r
## Pitfalls\r
\r
- **Pruner needs `trial.report()` calls during training.** If the script only reports at the end, the pruner has nothing to prune on. Call `trial.report(val_loss, epoch)` at least every 10 epochs.\r
- **HyperParameterOptimizer needs the metric name in `Title/Series` format.** If your metric is `seg_total`, it becomes `title=seg_total, series=seg_total`. If your clearml report_scalar is `report_scalar("Score", "seg_total", v)`, pass `--metric Score/seg_total`.\r
- **Clearml-agent must be running on GPU nodes** before submitting. Verify with `expflow clearml workers` or check Web UI.\r
- **`_collect_one_trial` polls every 5s** — waits up to 60min per trial. If trials are expected to run longer, increase `timeout_minutes`.\r
\r
## Architecture Reference\r
\r
Key files in `expflow_pde/`:\r
- `hpo.py` — 3-mode HPO runner (local/distributed/optimizer)\r
- `pipeline.py` — ExperimentPipeline class (fast/full modes)\r
- `cli_pipeline.py` — `pipeline submit` + `pipeline submit-full`\r
- `cli_optuna.py` — `optuna run` with all three backends\r
\r
## Related\r
\r
- `experiment-lifecycle-governance` — PIN, metrics registry, compare-scores, competition rules audit\r
- `pde-experiment-hyperparameters` — PDEBench-specific hyperparameter reference\r
- `multi-agent-distributed-experiment-workflow` — Hermes → OpenCode → clearml\r
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install expflow-pipeline-hpo - After installation, invoke the skill by name or use
/expflow-pipeline-hpo - Provide required inputs per the skill's parameter spec and get structured output
What is expflow Pipeline HPO?
PDEBench competition workflow orchestration with expflow — three pipeline modes (full/fast/skip), distributed HPO, pruner integration, and ClearML HyperParam... It is an AI Agent Skill for Claude Code / OpenClaw, with 49 downloads so far.
How do I install expflow Pipeline HPO?
Run "/install expflow-pipeline-hpo" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is expflow Pipeline HPO free?
Yes, expflow Pipeline HPO is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does expflow Pipeline HPO support?
expflow Pipeline HPO is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created expflow Pipeline HPO?
It is built and maintained by diamond2nv (@diamond2nv); the current version is v0.5.0.