← Back to Skills Marketplace

Agent Eval Suite

Name: Agent Eval Suite
Author: yuyonghao-123

by yuyonghao-123 · GitHub ↗ · v0.1.0 · MIT-0

cross-platform ⚠ suspicious

143

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install yuyonghao-agent-eval-suite

Description

Provides benchmark testing, A/B testing, performance regression detection, and simulation environment testing for agent evaluation.

Usage Guidance

This package appears to implement the evaluation features it claims, but review before use: 1) The Simulator.loadScenario will read files from disk and accepts absolute paths — do not pass untrusted scenario names (or run it as a privileged user) because it can read arbitrary JSON files from the host. 2) Chaos/fault injection can allocate memory and sleep for long periods; running large simulations or untrusted scenarios could exhaust resources or cause timeouts. 3) Because there is no remote network activity in the code, risk is local-file and resource exposure; run the tool in a sandbox/container, inspect the simulator's scenario loading and fault-injection code, and avoid supplying scenario names or files from untrusted sources. If you want higher assurance, ask the author to sanitize scenario path handling (disallow absolute paths or restrict to a safe fixtures directory) and to make fault-injection limits configurable and documented.

Capability Assessment

✓ Purpose & Capability

Name/description match the provided code: Benchmark, ABTester, RegressionDetector, Simulator, and ReportGenerator are present and implement the advertised testing and analysis features.

⚠ Instruction Scope

SKILL.md usage examples are limited and appropriate, but the Simulator.loadScenario implementation reads JSON from the filesystem using fs.readFileSync and accepts absolute paths (path.isAbsolute(scenarioName) ? scenarioName : ...). That lets the skill read arbitrary files if given an absolute path, which is not documented in SKILL.md and goes beyond the advertised sandbox/simulation behavior. The Simulator also offers fault injection that can allocate memory (Buffer.alloc(10MB)) and long sleeps, which could be used to consume host resources.

✓ Install Mechanism

This is an instruction-only skill with no declared install spec in the registry. SKILL.md suggests running npm install locally; package.json is included and there are no remote download URLs or install hooks—no high-risk install mechanism detected.

✓ Credentials

The skill does not request environment variables, credentials, or config paths. No code reads environment variables or external secrets. Requested access is proportional to the described functionality.

✓ Persistence & Privilege

Skill is not always-enabled and does not request persistent privileges or modify other skills or agent-wide configs. No autonomous invocation flag escalation beyond platform defaults.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install yuyonghao-agent-eval-suite
After installation, invoke the skill by name or use /yuyonghao-agent-eval-suite
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.1.0

Initial release of Agent Eval Suite: - Provides a standardized benchmarking framework with multi-dimensional metrics and scoring system. - Supports A/B testing with control/treatment groups, randomization, and statistical significance analysis. - Includes performance regression detection with historical comparison, trend analysis, and root cause exploration. - Offers sandboxed simulation for scenario testing, boundary conditions, and fault injection. - Includes clear installation and usage instructions with code samples.

Metadata

Slug yuyonghao-agent-eval-suite

Version 0.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Agent Eval Suite?

Provides benchmark testing, A/B testing, performance regression detection, and simulation environment testing for agent evaluation. It is an AI Agent Skill for Claude Code / OpenClaw, with 143 downloads so far.

How do I install Agent Eval Suite?

Run "/install yuyonghao-agent-eval-suite" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Eval Suite free?

Yes, Agent Eval Suite is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Agent Eval Suite support?

Agent Eval Suite is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Eval Suite?

It is built and maintained by yuyonghao-123 (@yuyonghao-123); the current version is v0.1.0.

More Skills