← 返回 Skills 市场
diamond2nv

Experiment Lifecycle Governance

作者 diamond2nv · GitHub ↗ · v0.5.0 · MIT-0
cross-platform ✓ 安全检测通过
36
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install experiment-lifecycle-governance
功能描述
Add governance to experiment workflows — PIN-protected destructive ops, standardized metrics registry with thresholds, compare-scores ranking with gating, an...
使用说明 (SKILL.md)

Experiment Lifecycle Governance

Overview

Governance layer for experiment workflows: protect destructive operations, standardize metrics, rank experiments with gating, and audit against competition rules.

Three sub-systems:

  1. PIN Protection — 4-digit PIN guard for cancel/stop/delete operations
  2. Metrics Registry — Standardized metric definitions with thresholds
  3. Compare-Scores — Multi-model ranking with gating

Installation

pip install expflow-pde

1. PIN Protection Pattern

Architecture

~/.expflow/pin.hash          # SHA-256 hash of 4-digit PIN (never plaintext)
~/.expflow/experiments.jsonl # Experiment registry (each line = JSON record)

Module Design

# pin.py — 4 components:
# 1. init_pin(pin: str) -> hash          # Validate + hash + write
# 2. verify_pin(pin: str) -> bool         # Hash comparison
# 3. pin_is_set() -> bool                 # Check if PIN configured
# 4. guard(action_description) -> bool    # Interactive prompt

# sha256 hash — never store raw PIN
def _hash_pin(pin: str) -> str:
    return hashlib.sha256(pin.encode()).hexdigest()

# Validate exactly 4 digits
def _validate_pin(pin: str) -> None:
    if not pin.isdigit() or len(pin) != 4:
        raise ValueError("PIN must be exactly 4 digits (0-9)")

CLI Commands

expflow pin init 1234          # Set PIN (SHA-256 stored)
expflow pin check              # Interactive verify
expflow pin clear [--force]    # Remove PIN
expflow pin status             # Show if active

# Guarded commands (require PIN unless --force):
expflow run cancel \x3Cid>            # Interactive PIN prompt
expflow run cancel \x3Cid> --force    # Skip PIN

2. Standardized Metrics Registry

Structure

STANDARD_METRICS = {
    "seg_total": {
        "type": "scalar", "group": "Score",
        "higher_is_better": True,
        "description": "Total segment score (primary competition metric)",
    },
    "pde_mean": {
        "type": "scalar", "group": "PDE",
        "higher_is_better": False,
        "threshold": 18.09,  # Competition gate
    },
    "train_time_min": {
        "type": "scalar", "group": "Time",
        "higher_is_better": False,
        "threshold": 60,  # Competition limit
    },
    # ... 13 total metrics across Score/Loss/PDE/Time/Model/Training groups
}

report_standard()

def report_standard(task: Any | None = None, **kwargs: float) -> dict[str, float]:
    reported = {}
    for name, value in kwargs.items():
        info = STANDARD_METRICS.get(name)
        if info is None:
            raise ValueError(f"Unknown metric '{name}'...")
        reported[name] = float(value)
        if task is not None:
            task.report_scalar(title=info["group"], series=name, value=float(value), iteration=0)
    return reported

3. Compare-Scores: Multi-Model Ranking

CLI

expflow clearml compare-scores \
    --project PDEBench --tags task1 \
    --sort-by pde_mean --ascending \
    --gate pde_mean:lt:18.09 --gate train_time_min:lt:60

Gate Format

Gates use metric:op:value triplets:

  • pde_mean:lt:18.09 — PDE mean \x3C 18.09
  • train_time_min:le:60 — Training time ≤ 60 min
  • seg_total:ge:50 — Score ≥ 50

Operators: lt, le, gt, ge.

4. Competition Rules Audit

CLI

expflow audit validate exp-001 --competition-rules --task-id abc123

Python API

from expflow_pde.audit import validate_competition_rules

result = validate_competition_rules(
    task_metrics={"seg_total": 57.09, "pde_mean": 15.0, "train_time_min": 45.5},
    task_params={"Args/--sub_step": "5"},
)
print(f"All pass: {result['all_pass']}")

Validation Checks

Check Condition Details
seg_total Primary competition score (no gating) Reported, not gated
pde_mean Must be \x3C 18.09 Threshold from STANDARD_METRICS
train_time_min Must be \x3C 60 Threshold from STANDARD_METRICS
sub_step parameter Must exist and be > 0 Searches case-insensitive

Testing Patterns

PIN Tests (36 tests)

  • Hash consistency: same input → same hash
  • Validation rejects: wrong length, non-numeric, empty
  • Init → file exists with correct hash
  • Guard mock: correct → True, quit → False

Metrics Tests

  • Registry structure: each metric has type, group, higher_is_better
  • report_standard: returns dict of reported metrics

Compare Tests

  • _apply_gate: all 4 operators (lt/le/gt/ge) with passing and failing cases

Pitfalls

1. YAML/Env vs File Storage for PIN

PIN hash must NOT go into config.yaml (risk of git commit). Use ~/.expflow/pin.hash. Precedence: pin.hash file > .env EXPFLOW_PIN_HASH > config.yaml pin.hash.

2. get_last_scalar_metrics() clearml API

Returns nested dict: {"Score": {"seg_total": {"last": 57.09, ...}}, ...}. Flatten to {"seg_total": 57.09} for compare_scores.

3. --force Flag for Script Calls

Always provide --force / -f on PIN-guarded commands for CI/automation.

4. Interactive getpass vs Non-Interactive

getpass.getpass() works in terminals but fails in piped commands, CI, or subagent calls. Always provide --pin or --force as alternative paths.

安全使用建议
This result has low confidence because the artifact files could not be inspected; rerun the review once workspace file access is working before relying on it for installation decisions.
能力评估
Purpose & Capability
Workspace inspection failed before artifacts could be read, so there is no evidence-backed purpose or capability concern to report.
Instruction Scope
No artifact instructions were available for review, so no scope concern is supported by evidence.
Install Mechanism
No install artifact was available for review, so no install-mechanism concern is supported by evidence.
Credentials
No artifact evidence was available showing disproportionate environment access.
Persistence & Privilege
No artifact evidence was available showing persistence or privilege concerns.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install experiment-lifecycle-governance
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /experiment-lifecycle-governance 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.5.0
- Initial public release of experiment-lifecycle-governance, providing experiment workflow governance features. - Adds PIN-protected guard for destructive operations (cancel, stop, delete). - Introduces standardized metrics registry with thresholds for consistent experiment reporting. - Supports multi-model experiment ranking and scoring with gating based on metric thresholds. - Implements an audit tool to validate compliance with competition rules using experiment metrics and parameters. - Includes detailed CLI and Python API, error handling, edge-case testing, and guidance on secure PIN management.
元数据
Slug experiment-lifecycle-governance
版本 0.5.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Experiment Lifecycle Governance 是什么?

Add governance to experiment workflows — PIN-protected destructive ops, standardized metrics registry with thresholds, compare-scores ranking with gating, an... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 36 次。

如何安装 Experiment Lifecycle Governance?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install experiment-lifecycle-governance」即可一键安装,无需额外配置。

Experiment Lifecycle Governance 是免费的吗?

是的,Experiment Lifecycle Governance 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Experiment Lifecycle Governance 支持哪些平台?

Experiment Lifecycle Governance 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Experiment Lifecycle Governance?

由 diamond2nv(@diamond2nv)开发并维护,当前版本 v0.5.0。

💬 留言讨论