← Back to Skills Marketplace
diamond2nv

Experiment Lifecycle Governance

by diamond2nv · GitHub ↗ · v0.5.0 · MIT-0
cross-platform ✓ Security Clean
36
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install experiment-lifecycle-governance
Description
Add governance to experiment workflows — PIN-protected destructive ops, standardized metrics registry with thresholds, compare-scores ranking with gating, an...
README (SKILL.md)

Experiment Lifecycle Governance

Overview

Governance layer for experiment workflows: protect destructive operations, standardize metrics, rank experiments with gating, and audit against competition rules.

Three sub-systems:

  1. PIN Protection — 4-digit PIN guard for cancel/stop/delete operations
  2. Metrics Registry — Standardized metric definitions with thresholds
  3. Compare-Scores — Multi-model ranking with gating

Installation

pip install expflow-pde

1. PIN Protection Pattern

Architecture

~/.expflow/pin.hash          # SHA-256 hash of 4-digit PIN (never plaintext)
~/.expflow/experiments.jsonl # Experiment registry (each line = JSON record)

Module Design

# pin.py — 4 components:
# 1. init_pin(pin: str) -> hash          # Validate + hash + write
# 2. verify_pin(pin: str) -> bool         # Hash comparison
# 3. pin_is_set() -> bool                 # Check if PIN configured
# 4. guard(action_description) -> bool    # Interactive prompt

# sha256 hash — never store raw PIN
def _hash_pin(pin: str) -> str:
    return hashlib.sha256(pin.encode()).hexdigest()

# Validate exactly 4 digits
def _validate_pin(pin: str) -> None:
    if not pin.isdigit() or len(pin) != 4:
        raise ValueError("PIN must be exactly 4 digits (0-9)")

CLI Commands

expflow pin init 1234          # Set PIN (SHA-256 stored)
expflow pin check              # Interactive verify
expflow pin clear [--force]    # Remove PIN
expflow pin status             # Show if active

# Guarded commands (require PIN unless --force):
expflow run cancel \x3Cid>            # Interactive PIN prompt
expflow run cancel \x3Cid> --force    # Skip PIN

2. Standardized Metrics Registry

Structure

STANDARD_METRICS = {
    "seg_total": {
        "type": "scalar", "group": "Score",
        "higher_is_better": True,
        "description": "Total segment score (primary competition metric)",
    },
    "pde_mean": {
        "type": "scalar", "group": "PDE",
        "higher_is_better": False,
        "threshold": 18.09,  # Competition gate
    },
    "train_time_min": {
        "type": "scalar", "group": "Time",
        "higher_is_better": False,
        "threshold": 60,  # Competition limit
    },
    # ... 13 total metrics across Score/Loss/PDE/Time/Model/Training groups
}

report_standard()

def report_standard(task: Any | None = None, **kwargs: float) -> dict[str, float]:
    reported = {}
    for name, value in kwargs.items():
        info = STANDARD_METRICS.get(name)
        if info is None:
            raise ValueError(f"Unknown metric '{name}'...")
        reported[name] = float(value)
        if task is not None:
            task.report_scalar(title=info["group"], series=name, value=float(value), iteration=0)
    return reported

3. Compare-Scores: Multi-Model Ranking

CLI

expflow clearml compare-scores \
    --project PDEBench --tags task1 \
    --sort-by pde_mean --ascending \
    --gate pde_mean:lt:18.09 --gate train_time_min:lt:60

Gate Format

Gates use metric:op:value triplets:

  • pde_mean:lt:18.09 — PDE mean \x3C 18.09
  • train_time_min:le:60 — Training time ≤ 60 min
  • seg_total:ge:50 — Score ≥ 50

Operators: lt, le, gt, ge.

4. Competition Rules Audit

CLI

expflow audit validate exp-001 --competition-rules --task-id abc123

Python API

from expflow_pde.audit import validate_competition_rules

result = validate_competition_rules(
    task_metrics={"seg_total": 57.09, "pde_mean": 15.0, "train_time_min": 45.5},
    task_params={"Args/--sub_step": "5"},
)
print(f"All pass: {result['all_pass']}")

Validation Checks

Check Condition Details
seg_total Primary competition score (no gating) Reported, not gated
pde_mean Must be \x3C 18.09 Threshold from STANDARD_METRICS
train_time_min Must be \x3C 60 Threshold from STANDARD_METRICS
sub_step parameter Must exist and be > 0 Searches case-insensitive

Testing Patterns

PIN Tests (36 tests)

  • Hash consistency: same input → same hash
  • Validation rejects: wrong length, non-numeric, empty
  • Init → file exists with correct hash
  • Guard mock: correct → True, quit → False

Metrics Tests

  • Registry structure: each metric has type, group, higher_is_better
  • report_standard: returns dict of reported metrics

Compare Tests

  • _apply_gate: all 4 operators (lt/le/gt/ge) with passing and failing cases

Pitfalls

1. YAML/Env vs File Storage for PIN

PIN hash must NOT go into config.yaml (risk of git commit). Use ~/.expflow/pin.hash. Precedence: pin.hash file > .env EXPFLOW_PIN_HASH > config.yaml pin.hash.

2. get_last_scalar_metrics() clearml API

Returns nested dict: {"Score": {"seg_total": {"last": 57.09, ...}}, ...}. Flatten to {"seg_total": 57.09} for compare_scores.

3. --force Flag for Script Calls

Always provide --force / -f on PIN-guarded commands for CI/automation.

4. Interactive getpass vs Non-Interactive

getpass.getpass() works in terminals but fails in piped commands, CI, or subagent calls. Always provide --pin or --force as alternative paths.

Usage Guidance
This result has low confidence because the artifact files could not be inspected; rerun the review once workspace file access is working before relying on it for installation decisions.
Capability Assessment
Purpose & Capability
Workspace inspection failed before artifacts could be read, so there is no evidence-backed purpose or capability concern to report.
Instruction Scope
No artifact instructions were available for review, so no scope concern is supported by evidence.
Install Mechanism
No install artifact was available for review, so no install-mechanism concern is supported by evidence.
Credentials
No artifact evidence was available showing disproportionate environment access.
Persistence & Privilege
No artifact evidence was available showing persistence or privilege concerns.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install experiment-lifecycle-governance
  3. After installation, invoke the skill by name or use /experiment-lifecycle-governance
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.5.0
- Initial public release of experiment-lifecycle-governance, providing experiment workflow governance features. - Adds PIN-protected guard for destructive operations (cancel, stop, delete). - Introduces standardized metrics registry with thresholds for consistent experiment reporting. - Supports multi-model experiment ranking and scoring with gating based on metric thresholds. - Implements an audit tool to validate compliance with competition rules using experiment metrics and parameters. - Includes detailed CLI and Python API, error handling, edge-case testing, and guidance on secure PIN management.
Metadata
Slug experiment-lifecycle-governance
Version 0.5.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Experiment Lifecycle Governance?

Add governance to experiment workflows — PIN-protected destructive ops, standardized metrics registry with thresholds, compare-scores ranking with gating, an... It is an AI Agent Skill for Claude Code / OpenClaw, with 36 downloads so far.

How do I install Experiment Lifecycle Governance?

Run "/install experiment-lifecycle-governance" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Experiment Lifecycle Governance free?

Yes, Experiment Lifecycle Governance is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Experiment Lifecycle Governance support?

Experiment Lifecycle Governance is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Experiment Lifecycle Governance?

It is built and maintained by diamond2nv (@diamond2nv); the current version is v0.5.0.

💬 Comments