ExpertPack Eval
/install expertpack-eval
ExpertPack Eval
Measure and evaluate ExpertPack quality. Companion to the core expertpack skill.
Note: This skill makes external API calls to OpenRouter for blind probing and LLM-as-judge scoring. Requires an API key.
1. Measure EK Ratio
Blind-probe frontier models to measure what percentage of a pack's propositions they cannot answer without the pack loaded:
python3 {skill_dir}/scripts/eval-ek.py \x3Cpack-path> [--models model1,model2] [--sample N] [--output FILE]
- Default models: GPT-4.1-mini, Claude Sonnet 4.6, Gemini 2.0 Flash (via OpenRouter)
- API key: Auto-resolves from OpenClaw auth profiles or
OPENROUTER_API_KEYenv var - Judge model: Claude Sonnet (GPT-4.1-mini is unreliable as judge — defaults to "partial")
- Output: YAML with per-proposition scores and aggregate ratio
Interpretation:
| EK Ratio | Meaning |
|---|---|
| 0.80+ | Exceptional — almost entirely esoteric |
| 0.60–0.79 | Strong — majority esoteric |
| 0.40–0.59 | Mixed — significant GK padding |
| 0.20–0.39 | Weak — most content already in weights |
| \x3C 0.20 | Minimal value-add |
Add measured ratio to manifest.yaml:
ek_ratio:
value: 0.72
measured: "2026-03-12"
models: ["gpt-4.1-mini", "claude-sonnet-4-6", "gemini-2.0-flash"]
propositions_tested: 142
2. Run Quality Eval
Automated eval against a pack-powered agent endpoint:
python3 {skill_dir}/scripts/run-eval.py \
--questions \x3Ceval-set.yaml> \
--endpoint \x3Cws://host:port/path> \
--output \x3Cresults.yaml> \
--label "baseline"
- Build eval set: 30+ questions (basic, intermediate, advanced, out-of-scope)
- Fix one dimension at a time: structure → agent training → model
- Re-run after each change to verify improvement
Learn more: expertpack.ai · GitHub
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install expertpack-eval - 安装完成后,直接呼叫该 Skill 的名称或使用
/expertpack-eval触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
ExpertPack Eval 是什么?
Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs c... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 254 次。
如何安装 ExpertPack Eval?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install expertpack-eval」即可一键安装,无需额外配置。
ExpertPack Eval 是免费的吗?
是的,ExpertPack Eval 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
ExpertPack Eval 支持哪些平台?
ExpertPack Eval 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 ExpertPack Eval?
由 Brian Hearn(@brianhearn)开发并维护,当前版本 v1.1.0。