Evalpal
/install evalpal
EvalPal Skill
Run AI agent evaluations inline. Trigger eval runs, poll for results, and list available evaluation definitions — all from chat.
Prerequisites
Set the following environment variables in your OpenClaw skill configuration:
| Variable | Required | Description |
|---|---|---|
EVALPAL_API_KEY |
Yes | Your EvalPal API key (starts with sk_) |
EVALPAL_API_URL |
No | Base URL (defaults to https://evalpal.dev) |
Get your API key from Settings → API Keys at evalpal.dev.
Commands
/evalpal run --eval-id \x3CID>
Trigger an evaluation run and wait for results.
Usage:
bash scripts/run-eval.sh --eval-id \x3CEVAL_DEFINITION_ID>
What it does:
- Triggers a new eval run via the EvalPal API
- Polls for completion with exponential backoff (up to 5 minutes)
- Fetches and formats results as readable markdown
Example output:
✅ Episode Quality — PASSED (15/16)
├── Test Case tc_001: ✓ PASS
├── Test Case tc_002: ✓ PASS
├── Test Case tc_003: ✗ FAIL
└── 12 more passed...
Run ID: run_abc123 · 16 test cases · 47s
Exit codes: 0 = all passed, 1 = failures or error.
/evalpal status --run-id \x3CID>
Check the current status of a running evaluation.
Usage:
bash scripts/check-status.sh --run-id \x3CRUN_ID>
Example output:
📊 Run Status: run_abc123
Status: running
Started: 2026-03-26T20:00:00Z
/evalpal list
List available evaluation definitions across your projects.
Usage:
bash scripts/list-evals.sh [--project-id \x3CPROJECT_ID>]
If --project-id is omitted, lists evals for all projects.
Example output:
📋 Evaluation Definitions
Project: AI Workforce Lab
abc123 Episode Quality Check
def456 Factual Accuracy Eval
Project: Customer Support Bot
ghi789 Response Quality
Error Handling
All scripts handle common error cases:
| Scenario | Output | Exit Code |
|---|---|---|
| No API key set | Error: EVALPAL_API_KEY is not set |
1 |
| Invalid API key | Error: Authentication failed (401) |
1 |
| Eval not found | Error: Eval definition not found (404) |
1 |
| Rate limited | Error: Rate limited — retry after Xs (429) |
1 |
| Timeout (5 min) | Error: Evaluation timed out after 300s |
1 |
| Network error | Error: Could not reach EvalPal API |
1 |
Security
- The API key is read from
EVALPAL_API_KEYenvironment variable only - Scripts never echo or log the API key
- All API calls use HTTPS
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install evalpal - After installation, invoke the skill by name or use
/evalpal - Provide required inputs per the skill's parameter spec and get structured output
What is Evalpal?
Run AI agent evaluations via EvalPal — trigger eval runs, check results, and list available evaluations. It is an AI Agent Skill for Claude Code / OpenClaw, with 147 downloads so far.
How do I install Evalpal?
Run "/install evalpal" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Evalpal free?
Yes, Evalpal is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Evalpal support?
Evalpal is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Evalpal?
It is built and maintained by MatthewEngman (@matthewengman); the current version is v1.0.1.