← Back to Skills Marketplace
lanyasheng

Improvement Orchestrator

by _silhouette · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
98
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install auto-improvement-orchestrator
Description
当需要一键跑完「生成→评分→评估→执行→门禁」全流程、失败后自动重试、或批量改进多个 skill 时使用。不用于单独评估 skill 质量(用 improvement-learner)或手动打分(用 improvement-discriminator)。
README (SKILL.md)

Improvement Orchestrator

Coordinates the full improvement pipeline: Generator → Discriminator → Evaluator → Executor → Gate.

When to Use

  • Run a full improvement cycle on one or more skills
  • Coordinate the 5-stage pipeline end-to-end (with optional evaluator)
  • Retry failed improvements with trace-aware feedback (Ralph Wiggum loop)

When NOT to Use

  • 只想检查 skill 质量评分 → use improvement-learner
  • 只想手动给候选打分 → use improvement-discriminator
  • 只想改一个文件 → use improvement-executor
  • 只想查基准数据 → use benchmark-store

Pipeline

propose → discriminate → evaluate* → execute → gate (7-layer)
         ↻ Ralph Wiggum: fail → inject trace → retry (max N)
         * evaluate skipped if: no --task-suite, OR low-risk docs/reference/guardrail (adaptive complexity)

Adaptive Complexity Skip: candidates with risk_level=low AND category in (docs, reference, guardrail) skip the evaluator stage entirely. Other categories always run evaluator when --task-suite is provided.

Evaluator→Gate Forwarding: if evaluator produces an artifact, its path is forwarded to gate via --evaluation, enabling RegressionGate to check evaluator verdict.

Baseline Evaluation: when --task-suite is given, orchestrator first runs evaluator in --standalone mode on the current SKILL.md to discover which tasks fail, then injects those failures as --source feedback to the generator.

CLI

python3 scripts/orchestrate.py \
  --target /path/to/skill \        # REQUIRED: skill directory or file to improve
  --state-root /path/to/state \    # REQUIRED: where artifacts are written
  --source feedback.json \         # repeatable: memory/feedback/trace files
  --max-retries 3 \                # default 3: Ralph Wiggum retry attempts
  --task-suite tasks.yaml \        # enables evaluator stage (real LLM eval)
  --eval-mock                      # evaluator uses mock execution, no claude CLI
Param Default When to change
--target (required) Always set — path to the skill dir to improve
--state-root (required) Always set — persistent state/artifact directory
--source [] Add feedback.json, memory files, or prior failure traces
--max-retries 3 Raise to 5 for hard-to-improve skills; lower to 1 for fast iteration
--task-suite None Provide to enable LLM-based evaluator; omit for docs-only changes
--eval-mock false Use in CI/testing to skip real claude -p calls

\x3Cexample> 正确用法: 对一个 skill 运行全流程改进(含 evaluator) $ python3 scripts/orchestrate.py --target /path/to/skill --state-root ./state --task-suite tasks.yaml → 0. Baseline evaluation: 发现 2 个 task 失败,注入 generator → 1. 生成候选 → 2. 多人盲审 → 3. 任务评估 → 4. 执行变更 → 5. 7层门禁 → 失败时自动注入 trace 重试(最多 3 次) → stdout: /path/to/state/pipeline-summary.json \x3C/example>

\x3Canti-example> 错误用法: 只想看评分却用了 orchestrator $ python3 scripts/orchestrate.py --target /path/to/skill --state-root ./state → 会实际执行变更!应该用 improvement-learner 的 self_improve.py \x3C/anti-example>

Error Handling

  • 每个 subprocess 有 1200s 超时,超时抛 RuntimeError
  • evaluator 失败不中断流程(打印警告继续),但 evaluation_failure_trace 会注入下轮
  • gate 返回 revert 时自动调用 extract_failure_trace() 写入 traces/trace-{run_id}.json
  • pipeline-summary.json 最终输出到 {state-root}/pipeline-summary.json

Output

最终输出 pipeline-summary.json

{"target": "/path/to/skill", "attempts": 2, "max_retries": 3,
 "final_decision": "keep", "final_candidate_id": "cand-01-docs",
 "final_artifact_path": "/state/receipts/gate-run001-cand-01.json"}

final_decision 取值: keep | revert | reject | pending_promote | no_candidates | no_accepted_candidates

Related Skills

  • improvement-generator: Produces candidate proposals (stage 1) — orchestrator calls propose.py
  • improvement-discriminator: Multi-reviewer panel scoring (stage 2) — orchestrator calls score.py
  • improvement-evaluator: Task suite execution validation (stage 3) — called only when --task-suite provided; baseline failures injected as --source
  • improvement-executor: Applies changes with backup/rollback (stage 4) — orchestrator calls execute.py
  • improvement-gate: 7-layer quality gate (stage 5) — receives --evaluation artifact when evaluator ran
Usage Guidance
This skill is an on‑repo pipeline orchestrator: it will run local scripts (propose/score/evaluate/execute/gate), create state artifacts and backups, and may apply changes to files under the --target you provide. Before running: 1) Inspect the scripts it invokes in your repository (improvement-generator/discriminator/evaluator/executor/gate) to ensure they do only what you expect; 2) Run first with a disposable --state-root and use --eval-mock (avoid real LLM CLI calls) to observe behavior; 3) Backup your target skill or point --target at a copy if you are not ready for automatic modifications; 4) Be aware that the evaluator or other invoked scripts may require separate API keys or env vars — the orchestrator itself does not request credentials. If you want to be extra cautious, run the included tests and review the executor's logic to confirm it only performs allowed actions (append_markdown_section, create_file) for low‑risk categories.
Capability Analysis
Type: OpenClaw Skill Name: auto-improvement-orchestrator Version: 1.0.0 The bundle is a legitimate orchestration tool designed to automate the improvement lifecycle of OpenClaw skills. The core logic in `scripts/orchestrate.py` coordinates a five-stage pipeline (Propose, Discriminate, Evaluate, Execute, Gate) by executing local Python scripts via `subprocess.run`. The implementation includes robust state management, automated retries (the 'Ralph Wiggum' loop), and extensive documentation in the `references/` directory detailing safety guardrails and rollbacks. No evidence of data exfiltration, malicious prompt injection, or unauthorized remote execution was found; the tool's behavior is strictly aligned with its stated purpose of skill optimization.
Capability Assessment
Purpose & Capability
The name/description (orchestrating a 5‑stage improvement pipeline) matches the actual behavior: the script dispatches Generator→Discriminator→Evaluator→Executor→Gate and writes state/artifacts and backups. All declared requirements (none) are appropriate for a local orchestrator that runs other local scripts.
Instruction Scope
SKILL.md and scripts explicitly instruct the agent to run local subprocesses, read feedback sources, and apply changes to the target skill (append markdown sections, create files) with backups/rollback. This is expected for an orchestrator, but it means the skill will read/write arbitrary files under the provided --target and --state-root and can forward failure traces into subsequent runs. The orchestrator itself does not call external endpoints, but it invokes other scripts (e.g., evaluator) which may call LLM CLIs or network services — review those scripts before use.
Install Mechanism
Instruction-only (no install spec). The bundle includes orchestration code only; nothing is downloaded or extracted from external URLs. Lowest install risk.
Credentials
The skill declares no required env vars or credentials, which is coherent. Caveat: the orchestrator spawns other local scripts (generator/discriminator/evaluator/executor/gate) that are expected to live in the repo; those sub-scripts may themselves require API keys or credentials (e.g., for LLMs) even though the orchestrator doesn't declare them. Confirm what the invoked scripts expect before running with real task suites.
Persistence & Privilege
always=false and no special platform privileges. The orchestrator writes persistent artifacts and backups to the user-supplied --state-root (normal for its purpose). It does not modify other skills' configuration beyond running the standard executor workflow for the provided --target, but it will apply changes to the target path (intended behavior).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install auto-improvement-orchestrator
  3. After installation, invoke the skill by name or use /auto-improvement-orchestrator
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: closed-loop skill improvement pipeline
Metadata
Slug auto-improvement-orchestrator
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Improvement Orchestrator?

当需要一键跑完「生成→评分→评估→执行→门禁」全流程、失败后自动重试、或批量改进多个 skill 时使用。不用于单独评估 skill 质量(用 improvement-learner)或手动打分(用 improvement-discriminator)。 It is an AI Agent Skill for Claude Code / OpenClaw, with 98 downloads so far.

How do I install Improvement Orchestrator?

Run "/install auto-improvement-orchestrator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Improvement Orchestrator free?

Yes, Improvement Orchestrator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Improvement Orchestrator support?

Improvement Orchestrator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Improvement Orchestrator?

It is built and maintained by _silhouette (@lanyasheng); the current version is v1.0.0.

💬 Comments