Description

Train, evaluate, and improve Agent skill files as reusable external capabilities. Use when a user wants to optimize SKILL.md, prompt procedures, OpenClaw/Her...

README (SKILL.md)

SkillOpt

Name: SkillOpt
Author: harrylabsj

Operating Idea

Treat a skill document as trainable external state. Keep the target model, tools, and runtime fixed; optimize only the skill text through measured task rollouts, failure reflection, small edits, validation gating, and versioned export.

Default output is a deployable best_skill.md plus a short optimization report. Training may use many traces and candidate files; deployment should require only the final skill file.

Invariants

Preserve the original skill before editing.
Separate train and validation tasks. Never accept an edit based only on the examples used to propose it.
Prefer small, reviewable edits over full rewrites. Keep the skill's public contract stable unless the task suite proves the contract is wrong.
Score behavior, not eloquence. A prettier skill that does not improve validation is rejected.
Record rejected edits and the reason, then consult that buffer before proposing another edit.
Do not add model-specific hacks unless the target deployment is explicitly model-specific.
Do not leak validation answers into the skill. Validation data may guide accept/reject decisions, not become memorized instructions.

Run Directory

Create a run directory near the skill being optimized unless the user specifies another path:

skillopt_runs/\x3Ctarget-skill-slug>/
  source_skill.md
  candidates/
    candidate_000.md
    candidate_001.md
  tasks/
    train.jsonl
    val.jsonl
  rollouts/
    train/
    val/
  rejected_edits.md
  best_skill.md
  report.md

Use scripts/skillopt.py for deterministic run setup, JSONL validation, simple command-backed rollouts, score aggregation, validation gates, and report generation. Read references/evaluation.md when defining task schemas or scorers.

Workflow

1. Define the Optimization Contract

Identify:

target skill path and deployment agents
target model/runtime/tool constraints to keep fixed during evaluation
success metric and acceptance threshold
task distribution the skill should serve
allowed edit budget, such as max 3 sections or max 25% changed lines per round

If no task suite exists, create a small proxy suite first, label it as proxy data, and tell the user that real production traces are needed for stronger conclusions.

2. Build Train and Validation Sets

Represent each task as JSONL with an id, prompt, optional inputs, and a scorer. Keep validation examples independent and representative.

Minimum split:

train.jsonl: failure discovery and edit proposal
val.jsonl: accept/reject gate

For fragile or high-stakes skills, add a hidden or holdout split outside the optimization loop and use it only for final reporting.

3. Run Baseline Rollouts

Evaluate the unmodified skill on train and validation tasks using the same target agent that will later deploy it.

Examples:

python3 scripts/skillopt.py init --skill path/to/SKILL.md --out skillopt_runs/my-skill
python3 scripts/skillopt.py validate-tasks skillopt_runs/my-skill/tasks/train.jsonl
python3 scripts/skillopt.py run --tasks skillopt_runs/my-skill/tasks/val.jsonl --skill skillopt_runs/my-skill/source_skill.md --out skillopt_runs/my-skill/rollouts/val_baseline --agent-command "hermes -s {skill_path} -z {prompt}"

For OpenClaw or any other agent, replace --agent-command with a command template that accepts {skill_path}, {prompt}, {task_id}, and optionally {output_path}.

4. Reflect on Traces

Analyze successful and failed rollouts separately.

For each failure, classify the root cause:

missing procedure
wrong tool order
weak verification
ambiguous output contract
missing edge case
over-broad instruction
environment assumption
scoring mismatch

Extract patterns across failures before editing. Do not chase one-off errors unless they reveal a generalizable instruction.

5. Propose a Controlled Edit

Generate one candidate skill with a concise edit rationale:

add: missing guardrail, checklist, or workflow step
delete: harmful or distracting instruction
replace: ambiguous wording with operational criteria
reorder: move high-leverage instructions earlier

Keep the candidate deployable as a normal skill. Avoid embedding run logs, benchmark answers, private traces, or optimizer notes in the final skill text.

6. Gate on Validation

Run the same validation set on the candidate. Accept only when the candidate beats the baseline by the configured threshold and does not introduce unacceptable regressions.

Default acceptance:

validation average improves by at least 0.02
no critical task regresses from pass to fail
skill remains shorter or only grows for a clear procedural reason
output format and trigger metadata remain valid

If rejected, append a short note to rejected_edits.md:

## candidate_003
Rejected because validation avg +0.00 and task val_docx_04 regressed.
Avoid adding broad "always rewrite" instructions; they caused format drift.

7. Iterate

Repeat rollout, reflection, candidate edit, and validation gate until:

validation score plateaus for 2 rounds
edit budget is exhausted
regressions become persistent
the skill is good enough for the user's target use

Track the best candidate, not merely the latest candidate.

8. Export

Copy the best accepted candidate to best_skill.md. If the user wants installation, replace or install the deployed skill only after showing the report summary.

The final report should include:

baseline train/validation scores
best candidate train/validation scores
accepted edits
rejected edit patterns
known overfitting risks
deployment instructions for OpenClaw, Hermes, or the current agent

Cross-Agent Notes

For Codex-style skills, keep required YAML frontmatter to name and description.
For Hermes, prefer standard SKILL.md folders and invoke with hermes -s \x3Cskill-or-path> when testing locally.
For OpenClaw, keep the same folder portable and install from the local directory when needed.
For unknown agents, use the skill as plain Markdown instructions plus any bundled scripts. The only required contract is: load the candidate skill, run the task prompt, capture output, score it, and compare against the baseline.

Quality Bar

A good SkillOpt run feels like engineering, not vibes:

claims are backed by recorded rollouts
edits are small enough to review
validation decides acceptance
rejected edits teach the next round
final deployment is one clean skill file

Usage Guidance

Install only if you plan to use it in a trusted workspace and understand that its helper can run local shell commands through --agent-command and command scorers in task files. Treat imported task suites as executable code, review command fields before running, and avoid using it on sensitive prompts or secrets unless you control where rollout logs are written.

Capability Assessment

ℹ Purpose & Capability

The file-writing, rollout, scoring, gating, and report-generation behavior fits the stated purpose of optimizing agent skill files.

⚠ Instruction Scope

The skill activates for broad optimization and workflow-improvement requests, then can lead the agent to run external commands and create local run artifacts without strong scope or trust-boundary instructions.

✓ Install Mechanism

No hidden installer, package dependency, network bootstrap, or persistence hook was found; the bundle contains markdown/yaml references and a local Python helper script.

⚠ Credentials

The helper intentionally supports caller-supplied agent command templates and task-defined command scorers with shell execution, which is powerful but not clearly limited to trusted task suites or approved executables.

ℹ Persistence & Privilege

The skill writes run directories, rollout outputs, summaries, reports, and exported best_skill.md files; this is purpose-aligned but may capture stdout/stderr and task outputs on disk.

Version History

v0.1.0

Initial release: SkillOpt workflow for train/validation skill optimization, rollout scoring, validation gates, and best_skill.md export.

Metadata

Slug skillopt

Version 0.1.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is SkillOpt?

Train, evaluate, and improve Agent skill files as reusable external capabilities. Use when a user wants to optimize SKILL.md, prompt procedures, OpenClaw/Her... It is an AI Agent Skill for Claude Code / OpenClaw, with 36 downloads so far.

How do I install SkillOpt?

Run "/install skillopt" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is SkillOpt free?

Yes, SkillOpt is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does SkillOpt support?

SkillOpt is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created SkillOpt?

It is built and maintained by haidong (@harrylabsj); the current version is v0.1.0.

More Skills

SkillOpt

SkillOpt

Operating Idea

Invariants

Run Directory

Workflow

1. Define the Optimization Contract

2. Build Train and Validation Sets

3. Run Baseline Rollouts

4. Reflect on Traces

5. Propose a Controlled Edit

6. Gate on Validation

7. Iterate

8. Export

Cross-Agent Notes

Quality Bar

What is SkillOpt?

How do I install SkillOpt?

Is SkillOpt free?

Which platforms does SkillOpt support?

Who created SkillOpt?

💬 Comments