← 返回 Skills 市场

Auto Improvement Orchestrator Skill

Name: Auto Improvement Orchestrator Skill
Author: lanyasheng

作者 _silhouette · GitHub ↗ · v1.0.3 · MIT-0

cross-platform ⚠ suspicious

118

总下载

当前安装

版本数

在 OpenClaw 中安装

/install auto-improvement-orchestrator-skill

功能描述

Skill 自动评估和改进管线。9 维结构评分（含 LLM-as-Judge）、4 角色加权、类别修正系数（tool/knowledge/orchestration/rule）、Pareto front 回归保护（security 2%/efficiency 10%/其他 5%）、trace-aware 失败...

使用说明 (SKILL.md)

Auto-Improvement Orchestrator

从评估到改进到验证的完整管线，让 Skill 自动变好。

When to Use

评估一个 skill 的质量（9 维打分 + 4 角色评审）
自动改进 SKILL.md（生成候选→打分→执行→门禁）
批量改进多个 skill（autoloop 连续运行）
从 Claude Code 会话日志提取用户反馈信号
对比 skill 改进前后的 Pareto front

When NOT to Use

手动编辑单个 SKILL.md → 直接改文件
Agent 执行可靠性 → 用 execution-harness（独立仓库）
纯文档生成 → 用 doc-gen
Prompt 优化（token 级）→ 用 DSPy

Quick Start

# 打分
python3 skills/improvement-learner/scripts/self_improve.py \
  --skill-path /your/skill --max-iterations 1

# 自动改进 5 轮
python3 skills/improvement-learner/scripts/self_improve.py \
  --skill-path /your/skill --max-iterations 5

# 从会话日志提取反馈
python3 skills/session-feedback-analyzer/scripts/analyze.py \
  --output feedback.jsonl

Architecture

11 个管线 skill 分三层：

评估层: learner（9 维结构评分）、evaluator（执行测试）、session-feedback（用户反馈）
改进层: generator → discriminator → evaluator → executor → gate（6 层门禁）
控制层: autoloop（连续运行）、benchmark-store（Pareto front）、execution-harness（独立仓库）

辅助工具: skill-forge（造 skill）、skill-distill（合并 skill）验证目标: prompt-hardening、deslop

execution-harness — Agent 执行可靠性（38 patterns × 6 axes）

安全使用建议

What to consider before installing/running this skill: - Missing declarations: The registry says 'no required env vars / config paths', but the code and docs expect LLM credentials (e.g., ANTHROPIC_API_KEY or a 'claude -p' CLI) and access to your local Claude session logs (~/.claude/projects). Ask the publisher to document required secrets and file access clearly before use. - Sensitive reads: The session-feedback analyzer reads ~/.claude/projects JSONL files. Those logs may contain private prompts, code, or data. Do not run this skill on a machine with sensitive Claude sessions unless you audit exactly what it reads and stores. - File modifications: The executor can apply edits to SKILL.md files and will create backups/receipts. Review the execute/rollback code and test in a sandbox or a copy of your skills directory to confirm behavior and backup integrity before running on real skill files. - LLM/network calls: The evaluator/discriminator call LLM backends (claude CLI or Anthropic API). Evaluate privacy/cost implications and consider using mock/backends or offline testing first. - Prompt-injection indicator: The SKILL.md contains unicode control characters flagged by a static check. Inspect the SKILL.md for invisible characters and remove or justify them; such characters can cause unexpected parsing when documents are used as prompts. - Least privilege: Provide only the minimum credentials and limit filesystem scope (run in a container or on a copy) until you validate the behavior. Prefer read-only dry-run modes and use '--dry-run' where available. - Review and test: Because the repository is large and powerful, run the pipeline on non-production examples, read the code paths that perform subprocess calls and file writes (improvement-executor, autoloop-controller, session-feedback-analyzer), and ask the maintainer for an explicit list of env vars, required binaries/CLIs, and config paths. If you want, I can extract the exact lines that reference ANTHROPIC_API_KEY, the paths under ~/.claude, and the code paths that perform file writes so you can inspect them before running.

功能分析

Type: OpenClaw Skill Name: auto-improvement-orchestrator-skill Version: 1.0.3 The bundle is a highly sophisticated and well-engineered orchestration system designed for the automated evaluation and iterative improvement of AI agent skills. It implements a complex five-stage pipeline (Propose, Discriminate, Evaluate, Execute, and Gate) that includes advanced features such as Pareto front regression protection (lib/pareto.py), multi-role blind panel scoring (score.py), and Karpathy-style self-improvement loops (self_improve.py). While the system utilizes high-privilege capabilities—including modifying local skill files (execute.py), reading Claude session logs for implicit feedback (analyze.py), and executing AI-generated prompts via the 'claude' CLI (task_runner.py)—these behaviors are transparently documented and strictly necessary for the stated goal of skill optimization. The presence of security-conscious design patterns, such as path traversal validation in the PytestJudge (judges.py) and explicit opt-out checks for log analysis, confirms that the bundle is a legitimate developer tool rather than malicious software.

能力标签

cryptocan-make-purchasesrequires-oauth-token

能力评估

⚠ Purpose & Capability

The skill claims no required env vars or config paths in registry metadata, yet the code and README/SKILL.md clearly expect access to LLM credentials (e.g., ANTHROPIC_API_KEY / claude CLI), local Claude session logs (~/.claude/projects/*.jsonl), and the ability to apply edits to SKILL.md files (executor/rollback). Those capabilities are coherent with an auto-improvement orchestrator in principle, but the metadata omission is a mismatch that could lead to unexpected access requests at runtime.

⚠ Instruction Scope

SKILL.md and README instruct the agent to parse ~/.claude/projects JSONL, extract user feedback, run evaluation loops that call 'claude -p' or use ANTHROPIC API, and to apply changes to skills (with backup/rollback). That means the runtime will read user session files, run external LLM calls (network), and write/modify skill files — all beyond a minimal 'lint-only' scope. The SKILL.md is also relatively terse compared to the complex scripts (many CLIs/flags exist in code but are undocumented), increasing risk of unintended actions.

ℹ Install Mechanism

There is no formal install spec (instruction-only at registry level), which is low-install risk. However the bundle includes a large Python codebase and README suggests pip-installing pyyaml/pytest. The absence of an explicit install script or declared dependency list in the registry metadata is a documentation gap but not a direct high-risk installer (no external downloads or archives referenced).

⚠ Credentials

Although registry metadata lists no required env vars, multiple code modules and docs reference LLM credentials / CLIs (ANTHROPIC_API_KEY, usage of 'claude -p' and optional mock backends). The skill also expects read access to user session logs (~/.claude/projects) and to write backups/executions directories. Requesting LLM API keys and local session access is proportionate to an evaluator/orchestrator only if explicitly declared and justified; here that mapping is missing, which is a coherence and least-privilege concern.

ℹ Persistence & Privilege

The skill does not set always:true and is user-invocable (normal). It has code to modify SKILL.md and write backups/receipts within execution/backups — that is expected for an auto-editing pipeline. This is a meaningful privilege (ability to change local skill files), but it is scoped to its own operation rather than claiming system-wide persistent privileges. Still, because the skill can autonomously apply changes, users should treat it as powerful and run it under controlled conditions.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install auto-improvement-orchestrator-skill
安装完成后，直接呼叫该 Skill 的名称或使用 /auto-improvement-orchestrator-skill 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.3

v2.0: 9-dim evaluation, category modifiers, enriched docs

v1.0.2

v2.0: 9-dim evaluation, category modifiers, per-dim Pareto tolerances, enriched docs

v1.0.1

v2.0: 9-dim evaluation, category modifiers, per-dim Pareto tolerances, enriched docs

v1.0.0

Major update to skill orchestration and self-improvement pipeline. - Introduces structured 9-dimensional scoring and 4-role evaluation (including LLM-as-Judge). - Adds correction factors for different categories and Pareto front regression safeguards. - Implements trace-aware failure retries and batch/looped skill improvement. - Now includes 11 pipeline skills, 2 helper tools, and 2 verification targets. - Provides clear separation of use cases and integration points (see execution-harness for agent reliability).

元数据

Slug auto-improvement-orchestrator-skill

版本 1.0.3

许可证 MIT-0

累计安装 1

当前安装数 0

历史版本数 4

常见问题