← Back to Skills Marketplace
brianhearn

ExpertPack Eval

by Brian Hearn · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
254
Downloads
1
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install expertpack-eval
Description
Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs c...
README (SKILL.md)

ExpertPack Eval

Measure and evaluate ExpertPack quality. Companion to the core expertpack skill.

Note: This skill makes external API calls to OpenRouter for blind probing and LLM-as-judge scoring. Requires an API key.

1. Measure EK Ratio

Blind-probe frontier models to measure what percentage of a pack's propositions they cannot answer without the pack loaded:

python3 {skill_dir}/scripts/eval-ek.py \x3Cpack-path> [--models model1,model2] [--sample N] [--output FILE]
  • Default models: GPT-4.1-mini, Claude Sonnet 4.6, Gemini 2.0 Flash (via OpenRouter)
  • API key: Auto-resolves from OpenClaw auth profiles or OPENROUTER_API_KEY env var
  • Judge model: Claude Sonnet (GPT-4.1-mini is unreliable as judge — defaults to "partial")
  • Output: YAML with per-proposition scores and aggregate ratio

Interpretation:

EK Ratio Meaning
0.80+ Exceptional — almost entirely esoteric
0.60–0.79 Strong — majority esoteric
0.40–0.59 Mixed — significant GK padding
0.20–0.39 Weak — most content already in weights
\x3C 0.20 Minimal value-add

Add measured ratio to manifest.yaml:

ek_ratio:
  value: 0.72
  measured: "2026-03-12"
  models: ["gpt-4.1-mini", "claude-sonnet-4-6", "gemini-2.0-flash"]
  propositions_tested: 142

2. Run Quality Eval

Automated eval against a pack-powered agent endpoint:

python3 {skill_dir}/scripts/run-eval.py \
  --questions \x3Ceval-set.yaml> \
  --endpoint \x3Cws://host:port/path> \
  --output \x3Cresults.yaml> \
  --label "baseline"
  • Build eval set: 30+ questions (basic, intermediate, advanced, out-of-scope)
  • Fix one dimension at a time: structure → agent training → model
  • Re-run after each change to verify improvement

Learn more: expertpack.ai · GitHub

Usage Guidance
Before installing or running this skill: (1) Expect the skill to send pack content, generated probe questions, and agent responses to OpenRouter (openrouter.ai) for both probing and judge scoring — do not run it on packs that contain proprietary or sensitive data unless you trust OpenRouter and the judge models. (2) The scripts will look for an OPENROUTER_API_KEY env var and will also try to read OpenClaw config files under ~/.openclaw to auto-resolve a key; the registry metadata did not declare this — verify you are comfortable with the skill reading those files. (3) The run-eval tool will connect to any endpoint you pass; ensure the endpoint is trusted and that you understand what data will be sent. (4) Consider running the scripts in an isolated environment or with a scoped API key (limited quota/permissions) and review the included Python files yourself. If you need the skill but want less data exposure, modify the scripts to avoid auto-reading ~/.openclaw and require the API key be passed explicitly at runtime.
Capability Analysis
Type: OpenClaw Skill Name: expertpack-eval Version: 1.1.0 The skill bundle contains scripts (scripts/eval-ek.py and scripts/run-eval.py) that automatically search for and read OpenRouter API keys from sensitive local configuration files in the user's home directory (~/.openclaw/agents/main/agent/auth-profiles.json and ~/.openclaw/.env). While this behavior is documented as a convenience feature for the OpenClaw environment, programmatic access to credential files is a high-risk pattern. The scripts use these keys to perform LLM-based evaluations and blind-probing via the OpenRouter API (openrouter.ai) and allow connections to arbitrary user-defined agent endpoints.
Capability Assessment
Purpose & Capability
The skill legitimately needs an OpenRouter API key and access to pack files to perform blind probing and judge scoring, and the scripts implement that. However the registry metadata lists no required environment variables or config paths even though SKILL.md and the scripts explicitly require/attempt to resolve an OPENROUTER_API_KEY and will read OpenClaw auth/config files under ~/.openclaw. The omission in metadata is an inconsistency that affects informed consent.
Instruction Scope
SKILL.md and the included scripts operate within the stated scope: they read proposition files from the provided pack path, generate probe questions, blind-probe frontier models via OpenRouter, and run evals against a user-supplied agent endpoint. They do not appear to scan unrelated system files, but they do attempt to read OpenClaw auth/config files (~/.openclaw/agents/main/agent/auth-profiles.json and models.json) to auto-resolve an API key — this is not declared in the registry metadata and merits user awareness.
Install Mechanism
No install spec; this is instruction-plus-scripts only. The only runtime requirement is python3 and (optionally) common Python packages like pyyaml, httpx, websockets. No external binary downloads or archive extraction are performed by the skill itself.
Credentials
The scripts require an OpenRouter API key (OPENROUTER_API_KEY) and will auto-resolve it from OpenClaw auth files in the user's home directory. The registry metadata did not declare this env var or config-path dependency. Because the skill transmits pack content and generated questions to OpenRouter (and sends eval questions/responses to whatever endpoint the user supplies), this credential and access are directly relevant and potentially sensitive — the metadata should explicitly declare them and users should understand what data will be sent to external services.
Persistence & Privilege
The skill is not always:true, does not request persistent platform privileges, and does not modify other skills or global settings. It runs on-demand and requires user-supplied paths/endpoints; it doesn't request broader system privileges beyond reading the OpenClaw config to auto-resolve credentials.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install expertpack-eval
  3. After installation, invoke the skill by name or use /expertpack-eval
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
Core 2.8: Obsidian compatibility
v2.0.0
Updated for Schema 2.7. Multi-model blind probing defaults (GPT-4.1-mini, Claude Sonnet 4.6, Gemini 2.0 Flash). EK ratio measurement and LLM-as-judge eval scoring.
v1.0.0
Initial release — EK ratio measurement via blind probing and automated quality eval runner. Companion to the core expertpack skill. Requires OpenRouter API key.
Metadata
Slug expertpack-eval
Version 1.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 3
Frequently Asked Questions

What is ExpertPack Eval?

Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs c... It is an AI Agent Skill for Claude Code / OpenClaw, with 254 downloads so far.

How do I install ExpertPack Eval?

Run "/install expertpack-eval" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ExpertPack Eval free?

Yes, ExpertPack Eval is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ExpertPack Eval support?

ExpertPack Eval is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ExpertPack Eval?

It is built and maintained by Brian Hearn (@brianhearn); the current version is v1.1.0.

💬 Comments