← 返回 Skills 市场
harrylabsj

SkillOpt

作者 haidong · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
36
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install skillopt
功能描述
Train, evaluate, and improve Agent skill files as reusable external capabilities. Use when a user wants to optimize SKILL.md, prompt procedures, OpenClaw/Her...
使用说明 (SKILL.md)

SkillOpt

Operating Idea

Treat a skill document as trainable external state. Keep the target model, tools, and runtime fixed; optimize only the skill text through measured task rollouts, failure reflection, small edits, validation gating, and versioned export.

Default output is a deployable best_skill.md plus a short optimization report. Training may use many traces and candidate files; deployment should require only the final skill file.

Invariants

  • Preserve the original skill before editing.
  • Separate train and validation tasks. Never accept an edit based only on the examples used to propose it.
  • Prefer small, reviewable edits over full rewrites. Keep the skill's public contract stable unless the task suite proves the contract is wrong.
  • Score behavior, not eloquence. A prettier skill that does not improve validation is rejected.
  • Record rejected edits and the reason, then consult that buffer before proposing another edit.
  • Do not add model-specific hacks unless the target deployment is explicitly model-specific.
  • Do not leak validation answers into the skill. Validation data may guide accept/reject decisions, not become memorized instructions.

Run Directory

Create a run directory near the skill being optimized unless the user specifies another path:

skillopt_runs/\x3Ctarget-skill-slug>/
  source_skill.md
  candidates/
    candidate_000.md
    candidate_001.md
  tasks/
    train.jsonl
    val.jsonl
  rollouts/
    train/
    val/
  rejected_edits.md
  best_skill.md
  report.md

Use scripts/skillopt.py for deterministic run setup, JSONL validation, simple command-backed rollouts, score aggregation, validation gates, and report generation. Read references/evaluation.md when defining task schemas or scorers.

Workflow

1. Define the Optimization Contract

Identify:

  • target skill path and deployment agents
  • target model/runtime/tool constraints to keep fixed during evaluation
  • success metric and acceptance threshold
  • task distribution the skill should serve
  • allowed edit budget, such as max 3 sections or max 25% changed lines per round

If no task suite exists, create a small proxy suite first, label it as proxy data, and tell the user that real production traces are needed for stronger conclusions.

2. Build Train and Validation Sets

Represent each task as JSONL with an id, prompt, optional inputs, and a scorer. Keep validation examples independent and representative.

Minimum split:

  • train.jsonl: failure discovery and edit proposal
  • val.jsonl: accept/reject gate

For fragile or high-stakes skills, add a hidden or holdout split outside the optimization loop and use it only for final reporting.

3. Run Baseline Rollouts

Evaluate the unmodified skill on train and validation tasks using the same target agent that will later deploy it.

Examples:

python3 scripts/skillopt.py init --skill path/to/SKILL.md --out skillopt_runs/my-skill
python3 scripts/skillopt.py validate-tasks skillopt_runs/my-skill/tasks/train.jsonl
python3 scripts/skillopt.py run --tasks skillopt_runs/my-skill/tasks/val.jsonl --skill skillopt_runs/my-skill/source_skill.md --out skillopt_runs/my-skill/rollouts/val_baseline --agent-command "hermes -s {skill_path} -z {prompt}"

For OpenClaw or any other agent, replace --agent-command with a command template that accepts {skill_path}, {prompt}, {task_id}, and optionally {output_path}.

4. Reflect on Traces

Analyze successful and failed rollouts separately.

For each failure, classify the root cause:

  • missing procedure
  • wrong tool order
  • weak verification
  • ambiguous output contract
  • missing edge case
  • over-broad instruction
  • environment assumption
  • scoring mismatch

Extract patterns across failures before editing. Do not chase one-off errors unless they reveal a generalizable instruction.

5. Propose a Controlled Edit

Generate one candidate skill with a concise edit rationale:

  • add: missing guardrail, checklist, or workflow step
  • delete: harmful or distracting instruction
  • replace: ambiguous wording with operational criteria
  • reorder: move high-leverage instructions earlier

Keep the candidate deployable as a normal skill. Avoid embedding run logs, benchmark answers, private traces, or optimizer notes in the final skill text.

6. Gate on Validation

Run the same validation set on the candidate. Accept only when the candidate beats the baseline by the configured threshold and does not introduce unacceptable regressions.

Default acceptance:

  • validation average improves by at least 0.02
  • no critical task regresses from pass to fail
  • skill remains shorter or only grows for a clear procedural reason
  • output format and trigger metadata remain valid

If rejected, append a short note to rejected_edits.md:

## candidate_003
Rejected because validation avg +0.00 and task val_docx_04 regressed.
Avoid adding broad "always rewrite" instructions; they caused format drift.

7. Iterate

Repeat rollout, reflection, candidate edit, and validation gate until:

  • validation score plateaus for 2 rounds
  • edit budget is exhausted
  • regressions become persistent
  • the skill is good enough for the user's target use

Track the best candidate, not merely the latest candidate.

8. Export

Copy the best accepted candidate to best_skill.md. If the user wants installation, replace or install the deployed skill only after showing the report summary.

The final report should include:

  • baseline train/validation scores
  • best candidate train/validation scores
  • accepted edits
  • rejected edit patterns
  • known overfitting risks
  • deployment instructions for OpenClaw, Hermes, or the current agent

Cross-Agent Notes

  • For Codex-style skills, keep required YAML frontmatter to name and description.
  • For Hermes, prefer standard SKILL.md folders and invoke with hermes -s \x3Cskill-or-path> when testing locally.
  • For OpenClaw, keep the same folder portable and install from the local directory when needed.
  • For unknown agents, use the skill as plain Markdown instructions plus any bundled scripts. The only required contract is: load the candidate skill, run the task prompt, capture output, score it, and compare against the baseline.

Quality Bar

A good SkillOpt run feels like engineering, not vibes:

  • claims are backed by recorded rollouts
  • edits are small enough to review
  • validation decides acceptance
  • rejected edits teach the next round
  • final deployment is one clean skill file
安全使用建议
Install only if you plan to use it in a trusted workspace and understand that its helper can run local shell commands through --agent-command and command scorers in task files. Treat imported task suites as executable code, review command fields before running, and avoid using it on sensitive prompts or secrets unless you control where rollout logs are written.
能力评估
Purpose & Capability
The file-writing, rollout, scoring, gating, and report-generation behavior fits the stated purpose of optimizing agent skill files.
Instruction Scope
The skill activates for broad optimization and workflow-improvement requests, then can lead the agent to run external commands and create local run artifacts without strong scope or trust-boundary instructions.
Install Mechanism
No hidden installer, package dependency, network bootstrap, or persistence hook was found; the bundle contains markdown/yaml references and a local Python helper script.
Credentials
The helper intentionally supports caller-supplied agent command templates and task-defined command scorers with shell execution, which is powerful but not clearly limited to trusted task suites or approved executables.
Persistence & Privilege
The skill writes run directories, rollout outputs, summaries, reports, and exported best_skill.md files; this is purpose-aligned but may capture stdout/stderr and task outputs on disk.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install skillopt
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /skillopt 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial release: SkillOpt workflow for train/validation skill optimization, rollout scoring, validation gates, and best_skill.md export.
元数据
Slug skillopt
版本 0.1.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

SkillOpt 是什么?

Train, evaluate, and improve Agent skill files as reusable external capabilities. Use when a user wants to optimize SKILL.md, prompt procedures, OpenClaw/Her... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 36 次。

如何安装 SkillOpt?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install skillopt」即可一键安装,无需额外配置。

SkillOpt 是免费的吗?

是的,SkillOpt 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

SkillOpt 支持哪些平台?

SkillOpt 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 SkillOpt?

由 haidong(@harrylabsj)开发并维护,当前版本 v0.1.0。

💬 留言讨论