← Back to Skills Marketplace
matthewengman

Evalpal

by MatthewEngman · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
147
Downloads
0
Stars
1
Active Installs
2
Versions
Install in OpenClaw
/install evalpal
Description
Run AI agent evaluations via EvalPal — trigger eval runs, check results, and list available evaluations
README (SKILL.md)

EvalPal Skill

Run AI agent evaluations inline. Trigger eval runs, poll for results, and list available evaluation definitions — all from chat.

Prerequisites

Set the following environment variables in your OpenClaw skill configuration:

Variable Required Description
EVALPAL_API_KEY Yes Your EvalPal API key (starts with sk_)
EVALPAL_API_URL No Base URL (defaults to https://evalpal.dev)

Get your API key from Settings → API Keys at evalpal.dev.

Commands

/evalpal run --eval-id \x3CID>

Trigger an evaluation run and wait for results.

Usage:

bash scripts/run-eval.sh --eval-id \x3CEVAL_DEFINITION_ID>

What it does:

  1. Triggers a new eval run via the EvalPal API
  2. Polls for completion with exponential backoff (up to 5 minutes)
  3. Fetches and formats results as readable markdown

Example output:

✅ Episode Quality — PASSED (15/16)
├── Test Case tc_001: ✓ PASS
├── Test Case tc_002: ✓ PASS
├── Test Case tc_003: ✗ FAIL
└── 12 more passed...

Run ID: run_abc123 · 16 test cases · 47s

Exit codes: 0 = all passed, 1 = failures or error.

/evalpal status --run-id \x3CID>

Check the current status of a running evaluation.

Usage:

bash scripts/check-status.sh --run-id \x3CRUN_ID>

Example output:

📊 Run Status: run_abc123
Status: running
Started: 2026-03-26T20:00:00Z

/evalpal list

List available evaluation definitions across your projects.

Usage:

bash scripts/list-evals.sh [--project-id \x3CPROJECT_ID>]

If --project-id is omitted, lists evals for all projects.

Example output:

📋 Evaluation Definitions

Project: AI Workforce Lab
  abc123  Episode Quality Check
  def456  Factual Accuracy Eval

Project: Customer Support Bot
  ghi789  Response Quality

Error Handling

All scripts handle common error cases:

Scenario Output Exit Code
No API key set Error: EVALPAL_API_KEY is not set 1
Invalid API key Error: Authentication failed (401) 1
Eval not found Error: Eval definition not found (404) 1
Rate limited Error: Rate limited — retry after Xs (429) 1
Timeout (5 min) Error: Evaluation timed out after 300s 1
Network error Error: Could not reach EvalPal API 1

Security

  • The API key is read from EVALPAL_API_KEY environment variable only
  • Scripts never echo or log the API key
  • All API calls use HTTPS
Usage Guidance
This skill appears to do exactly what it says: call the EvalPal API to list evals, start runs, and fetch results. Before installing: ensure you trust the https://evalpal.dev service and create an API key with the least privileges needed; avoid supplying a high-privilege or long-lived key if possible. Confirm you are comfortable allowing your agent to call the API (agent invocation is allowed by default). If you override EVALPAL_API_URL, verify the custom domain is trusted. As a routine precaution, rotate the API key if you suspect exposure and review logs for unexpected activity. Finally, you can sanity-check the included scripts locally (they're plain shell) to confirm they meet your operational and security expectations.
Capability Analysis
Type: OpenClaw Skill Name: evalpal Version: 1.0.1 The evalpal skill is a legitimate integration for the EvalPal AI evaluation platform, allowing users to trigger and monitor AI agent evaluations. The bundle contains shell scripts (run-eval.sh, check-status.sh, list-evals.sh) that interact with the official EvalPal API (https://evalpal.dev) using curl and jq. The implementation follows secure practices, such as using environment variables for API keys, employing HTTPS for all communications, and properly quoting variables to prevent shell injection. No evidence of malicious intent, data exfiltration, or prompt injection was found.
Capability Assessment
Purpose & Capability
Name/description describe running EvalPal evaluations and the bundle contains three scripts that call EvalPal API endpoints. The declared required binary/tools (curl, jq) and required env var (EVALPAL_API_KEY) are appropriate for this purpose. The SKILL.md documents an optional EVALPAL_API_URL (defaults to https://evalpal.dev), which is reasonable.
Instruction Scope
All instructions and included scripts consistently perform only API calls to the EvalPal service, poll for run status, and format results. The scripts do not read unrelated files, do not reference other environment secrets, and do not send data to external endpoints beyond the configured API_URL. They explicitly avoid printing the API key and use HTTPS.
Install Mechanism
No install spec; this is instruction-only plus shell scripts. That is low-risk: nothing is downloaded or installed by the skill itself. The only runtime dependencies are standard system tools (curl, jq).
Credentials
The skill requires a single API key (EVALPAL_API_KEY) which matches its stated need to authenticate to EvalPal. The optional EVALPAL_API_URL is documented but not required. No unrelated credentials or config paths are requested.
Persistence & Privilege
The skill is not set to always:true and does not request persistent or cross-skill configuration changes. It can be invoked by the agent (normal default); that autonomous invocation is expected for skills but does not add unusual privileges here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install evalpal
  3. After installation, invoke the skill by name or use /evalpal
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
Declare EVALPAL_API_KEY env var and curl/jq binary requirements in registry metadata
v1.0.0
- Initial release of EvalPal skill. - Run AI agent evaluations via EvalPal directly from chat. - Trigger evaluation runs, poll for results, and display outcomes in markdown. - Check status of evaluation runs by ID. - List all available evaluation definitions across your projects. - Handles authentication, error cases, and API limits securely.
Metadata
Slug evalpal
Version 1.0.1
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 2
Frequently Asked Questions

What is Evalpal?

Run AI agent evaluations via EvalPal — trigger eval runs, check results, and list available evaluations. It is an AI Agent Skill for Claude Code / OpenClaw, with 147 downloads so far.

How do I install Evalpal?

Run "/install evalpal" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Evalpal free?

Yes, Evalpal is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Evalpal support?

Evalpal is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Evalpal?

It is built and maintained by MatthewEngman (@matthewengman); the current version is v1.0.1.

💬 Comments