← Back to Skills Marketplace

Evalpal

Name: Evalpal
Author: matthewengman

by MatthewEngman · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ Security Clean

147

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install evalpal

Description

Run AI agent evaluations via EvalPal — trigger eval runs, check results, and list available evaluations

README (SKILL.md)

EvalPal Skill

Run AI agent evaluations inline. Trigger eval runs, poll for results, and list available evaluation definitions — all from chat.

Prerequisites

Set the following environment variables in your OpenClaw skill configuration:

Variable	Required	Description
`EVALPAL_API_KEY`	Yes	Your EvalPal API key (starts with `sk_`)
`EVALPAL_API_URL`	No	Base URL (defaults to `https://evalpal.dev`)

Get your API key from Settings → API Keys at evalpal.dev.

Commands

`/evalpal run --eval-id \x3CID>`

Trigger an evaluation run and wait for results.

Usage:

bash scripts/run-eval.sh --eval-id \x3CEVAL_DEFINITION_ID>

What it does:

Triggers a new eval run via the EvalPal API
Polls for completion with exponential backoff (up to 5 minutes)
Fetches and formats results as readable markdown

Example output:

✅ Episode Quality — PASSED (15/16)
├── Test Case tc_001: ✓ PASS
├── Test Case tc_002: ✓ PASS
├── Test Case tc_003: ✗ FAIL
└── 12 more passed...

Run ID: run_abc123 · 16 test cases · 47s

Exit codes: 0 = all passed, 1 = failures or error.

`/evalpal status --run-id \x3CID>`

Check the current status of a running evaluation.

Usage:

bash scripts/check-status.sh --run-id \x3CRUN_ID>

Example output:

📊 Run Status: run_abc123
Status: running
Started: 2026-03-26T20:00:00Z

`/evalpal list`

List available evaluation definitions across your projects.

Usage:

bash scripts/list-evals.sh [--project-id \x3CPROJECT_ID>]

If --project-id is omitted, lists evals for all projects.

Example output:

📋 Evaluation Definitions

Project: AI Workforce Lab
  abc123  Episode Quality Check
  def456  Factual Accuracy Eval

Project: Customer Support Bot
  ghi789  Response Quality

Error Handling

All scripts handle common error cases:

Scenario	Output	Exit Code
No API key set	`Error: EVALPAL_API_KEY is not set`	1
Invalid API key	`Error: Authentication failed (401)`	1
Eval not found	`Error: Eval definition not found (404)`	1
Rate limited	`Error: Rate limited — retry after Xs (429)`	1
Timeout (5 min)	`Error: Evaluation timed out after 300s`	1
Network error	`Error: Could not reach EvalPal API`	1

Security

The API key is read from EVALPAL_API_KEY environment variable only
Scripts never echo or log the API key
All API calls use HTTPS

Usage Guidance

This skill appears to do exactly what it says: call the EvalPal API to list evals, start runs, and fetch results. Before installing: ensure you trust the https://evalpal.dev service and create an API key with the least privileges needed; avoid supplying a high-privilege or long-lived key if possible. Confirm you are comfortable allowing your agent to call the API (agent invocation is allowed by default). If you override EVALPAL_API_URL, verify the custom domain is trusted. As a routine precaution, rotate the API key if you suspect exposure and review logs for unexpected activity. Finally, you can sanity-check the included scripts locally (they're plain shell) to confirm they meet your operational and security expectations.

Capability Analysis

Type: OpenClaw Skill Name: evalpal Version: 1.0.1 The evalpal skill is a legitimate integration for the EvalPal AI evaluation platform, allowing users to trigger and monitor AI agent evaluations. The bundle contains shell scripts (run-eval.sh, check-status.sh, list-evals.sh) that interact with the official EvalPal API (https://evalpal.dev) using curl and jq. The implementation follows secure practices, such as using environment variables for API keys, employing HTTPS for all communications, and properly quoting variables to prevent shell injection. No evidence of malicious intent, data exfiltration, or prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description describe running EvalPal evaluations and the bundle contains three scripts that call EvalPal API endpoints. The declared required binary/tools (curl, jq) and required env var (EVALPAL_API_KEY) are appropriate for this purpose. The SKILL.md documents an optional EVALPAL_API_URL (defaults to https://evalpal.dev), which is reasonable.

✓ Instruction Scope

All instructions and included scripts consistently perform only API calls to the EvalPal service, poll for run status, and format results. The scripts do not read unrelated files, do not reference other environment secrets, and do not send data to external endpoints beyond the configured API_URL. They explicitly avoid printing the API key and use HTTPS.

✓ Install Mechanism

No install spec; this is instruction-only plus shell scripts. That is low-risk: nothing is downloaded or installed by the skill itself. The only runtime dependencies are standard system tools (curl, jq).

✓ Credentials

The skill requires a single API key (EVALPAL_API_KEY) which matches its stated need to authenticate to EvalPal. The optional EVALPAL_API_URL is documented but not required. No unrelated credentials or config paths are requested.

✓ Persistence & Privilege

The skill is not set to always:true and does not request persistent or cross-skill configuration changes. It can be invoked by the agent (normal default); that autonomous invocation is expected for skills but does not add unusual privileges here.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install evalpal
After installation, invoke the skill by name or use /evalpal
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

Declare EVALPAL_API_KEY env var and curl/jq binary requirements in registry metadata

v1.0.0

- Initial release of EvalPal skill. - Run AI agent evaluations via EvalPal directly from chat. - Trigger evaluation runs, poll for results, and display outcomes in markdown. - Check status of evaluation runs by ID. - List all available evaluation definitions across your projects. - Handles authentication, error cases, and API limits securely.

Metadata

Slug evalpal

Version 1.0.1

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 2

Frequently Asked Questions

What is Evalpal?

Run AI agent evaluations via EvalPal — trigger eval runs, check results, and list available evaluations. It is an AI Agent Skill for Claude Code / OpenClaw, with 147 downloads so far.

How do I install Evalpal?

Run "/install evalpal" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Evalpal free?

Yes, Evalpal is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Evalpal support?

Evalpal is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Evalpal?

It is built and maintained by MatthewEngman (@matthewengman); the current version is v1.0.1.

More Skills

Evalpal

EvalPal Skill

Prerequisites

Commands

/evalpal run --eval-id \x3CID>

/evalpal status --run-id \x3CID>

/evalpal list

Error Handling

Security

What is Evalpal?

How do I install Evalpal?

Is Evalpal free?

Which platforms does Evalpal support?

Who created Evalpal?

💬 Comments

`/evalpal run --eval-id \x3CID>`

`/evalpal status --run-id \x3CID>`

`/evalpal list`