Experiment Lifecycle Governance
/install experiment-lifecycle-governance
Experiment Lifecycle Governance
Overview
Governance layer for experiment workflows: protect destructive operations, standardize metrics, rank experiments with gating, and audit against competition rules.
Three sub-systems:
- PIN Protection — 4-digit PIN guard for cancel/stop/delete operations
- Metrics Registry — Standardized metric definitions with thresholds
- Compare-Scores — Multi-model ranking with gating
Installation
pip install expflow-pde
1. PIN Protection Pattern
Architecture
~/.expflow/pin.hash # SHA-256 hash of 4-digit PIN (never plaintext)
~/.expflow/experiments.jsonl # Experiment registry (each line = JSON record)
Module Design
# pin.py — 4 components:
# 1. init_pin(pin: str) -> hash # Validate + hash + write
# 2. verify_pin(pin: str) -> bool # Hash comparison
# 3. pin_is_set() -> bool # Check if PIN configured
# 4. guard(action_description) -> bool # Interactive prompt
# sha256 hash — never store raw PIN
def _hash_pin(pin: str) -> str:
return hashlib.sha256(pin.encode()).hexdigest()
# Validate exactly 4 digits
def _validate_pin(pin: str) -> None:
if not pin.isdigit() or len(pin) != 4:
raise ValueError("PIN must be exactly 4 digits (0-9)")
CLI Commands
expflow pin init 1234 # Set PIN (SHA-256 stored)
expflow pin check # Interactive verify
expflow pin clear [--force] # Remove PIN
expflow pin status # Show if active
# Guarded commands (require PIN unless --force):
expflow run cancel \x3Cid> # Interactive PIN prompt
expflow run cancel \x3Cid> --force # Skip PIN
2. Standardized Metrics Registry
Structure
STANDARD_METRICS = {
"seg_total": {
"type": "scalar", "group": "Score",
"higher_is_better": True,
"description": "Total segment score (primary competition metric)",
},
"pde_mean": {
"type": "scalar", "group": "PDE",
"higher_is_better": False,
"threshold": 18.09, # Competition gate
},
"train_time_min": {
"type": "scalar", "group": "Time",
"higher_is_better": False,
"threshold": 60, # Competition limit
},
# ... 13 total metrics across Score/Loss/PDE/Time/Model/Training groups
}
report_standard()
def report_standard(task: Any | None = None, **kwargs: float) -> dict[str, float]:
reported = {}
for name, value in kwargs.items():
info = STANDARD_METRICS.get(name)
if info is None:
raise ValueError(f"Unknown metric '{name}'...")
reported[name] = float(value)
if task is not None:
task.report_scalar(title=info["group"], series=name, value=float(value), iteration=0)
return reported
3. Compare-Scores: Multi-Model Ranking
CLI
expflow clearml compare-scores \
--project PDEBench --tags task1 \
--sort-by pde_mean --ascending \
--gate pde_mean:lt:18.09 --gate train_time_min:lt:60
Gate Format
Gates use metric:op:value triplets:
pde_mean:lt:18.09— PDE mean \x3C 18.09train_time_min:le:60— Training time ≤ 60 minseg_total:ge:50— Score ≥ 50
Operators: lt, le, gt, ge.
4. Competition Rules Audit
CLI
expflow audit validate exp-001 --competition-rules --task-id abc123
Python API
from expflow_pde.audit import validate_competition_rules
result = validate_competition_rules(
task_metrics={"seg_total": 57.09, "pde_mean": 15.0, "train_time_min": 45.5},
task_params={"Args/--sub_step": "5"},
)
print(f"All pass: {result['all_pass']}")
Validation Checks
| Check | Condition | Details |
|---|---|---|
seg_total |
Primary competition score (no gating) | Reported, not gated |
pde_mean |
Must be \x3C 18.09 | Threshold from STANDARD_METRICS |
train_time_min |
Must be \x3C 60 | Threshold from STANDARD_METRICS |
sub_step parameter |
Must exist and be > 0 | Searches case-insensitive |
Testing Patterns
PIN Tests (36 tests)
- Hash consistency: same input → same hash
- Validation rejects: wrong length, non-numeric, empty
- Init → file exists with correct hash
- Guard mock: correct → True, quit → False
Metrics Tests
- Registry structure: each metric has type, group, higher_is_better
- report_standard: returns dict of reported metrics
Compare Tests
- _apply_gate: all 4 operators (lt/le/gt/ge) with passing and failing cases
Pitfalls
1. YAML/Env vs File Storage for PIN
PIN hash must NOT go into config.yaml (risk of git commit). Use ~/.expflow/pin.hash.
Precedence: pin.hash file > .env EXPFLOW_PIN_HASH > config.yaml pin.hash.
2. get_last_scalar_metrics() clearml API
Returns nested dict: {"Score": {"seg_total": {"last": 57.09, ...}}, ...}. Flatten to {"seg_total": 57.09} for compare_scores.
3. --force Flag for Script Calls
Always provide --force / -f on PIN-guarded commands for CI/automation.
4. Interactive getpass vs Non-Interactive
getpass.getpass() works in terminals but fails in piped commands, CI, or subagent calls. Always provide --pin or --force as alternative paths.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install experiment-lifecycle-governance - 安装完成后,直接呼叫该 Skill 的名称或使用
/experiment-lifecycle-governance触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Experiment Lifecycle Governance 是什么?
Add governance to experiment workflows — PIN-protected destructive ops, standardized metrics registry with thresholds, compare-scores ranking with gating, an... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 36 次。
如何安装 Experiment Lifecycle Governance?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install experiment-lifecycle-governance」即可一键安装,无需额外配置。
Experiment Lifecycle Governance 是免费的吗?
是的,Experiment Lifecycle Governance 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Experiment Lifecycle Governance 支持哪些平台?
Experiment Lifecycle Governance 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Experiment Lifecycle Governance?
由 diamond2nv(@diamond2nv)开发并维护,当前版本 v0.5.0。