← 返回 Skills 市场
brianhearn

ExpertPack Eval

作者 Brian Hearn · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
254
总下载
1
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install expertpack-eval
功能描述
Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs c...
使用说明 (SKILL.md)

ExpertPack Eval

Measure and evaluate ExpertPack quality. Companion to the core expertpack skill.

Note: This skill makes external API calls to OpenRouter for blind probing and LLM-as-judge scoring. Requires an API key.

1. Measure EK Ratio

Blind-probe frontier models to measure what percentage of a pack's propositions they cannot answer without the pack loaded:

python3 {skill_dir}/scripts/eval-ek.py \x3Cpack-path> [--models model1,model2] [--sample N] [--output FILE]
  • Default models: GPT-4.1-mini, Claude Sonnet 4.6, Gemini 2.0 Flash (via OpenRouter)
  • API key: Auto-resolves from OpenClaw auth profiles or OPENROUTER_API_KEY env var
  • Judge model: Claude Sonnet (GPT-4.1-mini is unreliable as judge — defaults to "partial")
  • Output: YAML with per-proposition scores and aggregate ratio

Interpretation:

EK Ratio Meaning
0.80+ Exceptional — almost entirely esoteric
0.60–0.79 Strong — majority esoteric
0.40–0.59 Mixed — significant GK padding
0.20–0.39 Weak — most content already in weights
\x3C 0.20 Minimal value-add

Add measured ratio to manifest.yaml:

ek_ratio:
  value: 0.72
  measured: "2026-03-12"
  models: ["gpt-4.1-mini", "claude-sonnet-4-6", "gemini-2.0-flash"]
  propositions_tested: 142

2. Run Quality Eval

Automated eval against a pack-powered agent endpoint:

python3 {skill_dir}/scripts/run-eval.py \
  --questions \x3Ceval-set.yaml> \
  --endpoint \x3Cws://host:port/path> \
  --output \x3Cresults.yaml> \
  --label "baseline"
  • Build eval set: 30+ questions (basic, intermediate, advanced, out-of-scope)
  • Fix one dimension at a time: structure → agent training → model
  • Re-run after each change to verify improvement

Learn more: expertpack.ai · GitHub

安全使用建议
Before installing or running this skill: (1) Expect the skill to send pack content, generated probe questions, and agent responses to OpenRouter (openrouter.ai) for both probing and judge scoring — do not run it on packs that contain proprietary or sensitive data unless you trust OpenRouter and the judge models. (2) The scripts will look for an OPENROUTER_API_KEY env var and will also try to read OpenClaw config files under ~/.openclaw to auto-resolve a key; the registry metadata did not declare this — verify you are comfortable with the skill reading those files. (3) The run-eval tool will connect to any endpoint you pass; ensure the endpoint is trusted and that you understand what data will be sent. (4) Consider running the scripts in an isolated environment or with a scoped API key (limited quota/permissions) and review the included Python files yourself. If you need the skill but want less data exposure, modify the scripts to avoid auto-reading ~/.openclaw and require the API key be passed explicitly at runtime.
功能分析
Type: OpenClaw Skill Name: expertpack-eval Version: 1.1.0 The skill bundle contains scripts (scripts/eval-ek.py and scripts/run-eval.py) that automatically search for and read OpenRouter API keys from sensitive local configuration files in the user's home directory (~/.openclaw/agents/main/agent/auth-profiles.json and ~/.openclaw/.env). While this behavior is documented as a convenience feature for the OpenClaw environment, programmatic access to credential files is a high-risk pattern. The scripts use these keys to perform LLM-based evaluations and blind-probing via the OpenRouter API (openrouter.ai) and allow connections to arbitrary user-defined agent endpoints.
能力评估
Purpose & Capability
The skill legitimately needs an OpenRouter API key and access to pack files to perform blind probing and judge scoring, and the scripts implement that. However the registry metadata lists no required environment variables or config paths even though SKILL.md and the scripts explicitly require/attempt to resolve an OPENROUTER_API_KEY and will read OpenClaw auth/config files under ~/.openclaw. The omission in metadata is an inconsistency that affects informed consent.
Instruction Scope
SKILL.md and the included scripts operate within the stated scope: they read proposition files from the provided pack path, generate probe questions, blind-probe frontier models via OpenRouter, and run evals against a user-supplied agent endpoint. They do not appear to scan unrelated system files, but they do attempt to read OpenClaw auth/config files (~/.openclaw/agents/main/agent/auth-profiles.json and models.json) to auto-resolve an API key — this is not declared in the registry metadata and merits user awareness.
Install Mechanism
No install spec; this is instruction-plus-scripts only. The only runtime requirement is python3 and (optionally) common Python packages like pyyaml, httpx, websockets. No external binary downloads or archive extraction are performed by the skill itself.
Credentials
The scripts require an OpenRouter API key (OPENROUTER_API_KEY) and will auto-resolve it from OpenClaw auth files in the user's home directory. The registry metadata did not declare this env var or config-path dependency. Because the skill transmits pack content and generated questions to OpenRouter (and sends eval questions/responses to whatever endpoint the user supplies), this credential and access are directly relevant and potentially sensitive — the metadata should explicitly declare them and users should understand what data will be sent to external services.
Persistence & Privilege
The skill is not always:true, does not request persistent platform privileges, and does not modify other skills or global settings. It runs on-demand and requires user-supplied paths/endpoints; it doesn't request broader system privileges beyond reading the OpenClaw config to auto-resolve credentials.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install expertpack-eval
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /expertpack-eval 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
Core 2.8: Obsidian compatibility
v2.0.0
Updated for Schema 2.7. Multi-model blind probing defaults (GPT-4.1-mini, Claude Sonnet 4.6, Gemini 2.0 Flash). EK ratio measurement and LLM-as-judge eval scoring.
v1.0.0
Initial release — EK ratio measurement via blind probing and automated quality eval runner. Companion to the core expertpack skill. Requires OpenRouter API key.
元数据
Slug expertpack-eval
版本 1.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

ExpertPack Eval 是什么?

Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs c... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 254 次。

如何安装 ExpertPack Eval?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install expertpack-eval」即可一键安装,无需额外配置。

ExpertPack Eval 是免费的吗?

是的,ExpertPack Eval 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ExpertPack Eval 支持哪些平台?

ExpertPack Eval 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ExpertPack Eval?

由 Brian Hearn(@brianhearn)开发并维护,当前版本 v1.1.0。

💬 留言讨论