/install agent-evaluation
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install agent-evaluation - After installation, invoke the skill by name or use
/agent-evaluation - Provide required inputs per the skill's parameter spec and get structured output
What is Agent Evaluation?
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent. It is an AI Agent Skill for Claude Code / OpenClaw, with 5372 downloads so far.
How do I install Agent Evaluation?
Run "/install agent-evaluation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Agent Evaluation free?
Yes, Agent Evaluation is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Agent Evaluation support?
Agent Evaluation is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Agent Evaluation?
It is built and maintained by rustyorb (@rustyorb); the current version is v1.0.0.