← Back to Skills Marketplace
Agent Eval Suite
by
yuyonghao-123
· GitHub ↗
· v0.1.0
· MIT-0
143
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install yuyonghao-agent-eval-suite
Description
Provides benchmark testing, A/B testing, performance regression detection, and simulation environment testing for agent evaluation.
Usage Guidance
This package appears to implement the evaluation features it claims, but review before use: 1) The Simulator.loadScenario will read files from disk and accepts absolute paths — do not pass untrusted scenario names (or run it as a privileged user) because it can read arbitrary JSON files from the host. 2) Chaos/fault injection can allocate memory and sleep for long periods; running large simulations or untrusted scenarios could exhaust resources or cause timeouts. 3) Because there is no remote network activity in the code, risk is local-file and resource exposure; run the tool in a sandbox/container, inspect the simulator's scenario loading and fault-injection code, and avoid supplying scenario names or files from untrusted sources. If you want higher assurance, ask the author to sanitize scenario path handling (disallow absolute paths or restrict to a safe fixtures directory) and to make fault-injection limits configurable and documented.
Capability Assessment
Purpose & Capability
Name/description match the provided code: Benchmark, ABTester, RegressionDetector, Simulator, and ReportGenerator are present and implement the advertised testing and analysis features.
Instruction Scope
SKILL.md usage examples are limited and appropriate, but the Simulator.loadScenario implementation reads JSON from the filesystem using fs.readFileSync and accepts absolute paths (path.isAbsolute(scenarioName) ? scenarioName : ...). That lets the skill read arbitrary files if given an absolute path, which is not documented in SKILL.md and goes beyond the advertised sandbox/simulation behavior. The Simulator also offers fault injection that can allocate memory (Buffer.alloc(10MB)) and long sleeps, which could be used to consume host resources.
Install Mechanism
This is an instruction-only skill with no declared install spec in the registry. SKILL.md suggests running npm install locally; package.json is included and there are no remote download URLs or install hooks—no high-risk install mechanism detected.
Credentials
The skill does not request environment variables, credentials, or config paths. No code reads environment variables or external secrets. Requested access is proportional to the described functionality.
Persistence & Privilege
Skill is not always-enabled and does not request persistent privileges or modify other skills or agent-wide configs. No autonomous invocation flag escalation beyond platform defaults.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install yuyonghao-agent-eval-suite - After installation, invoke the skill by name or use
/yuyonghao-agent-eval-suite - Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Initial release of Agent Eval Suite:
- Provides a standardized benchmarking framework with multi-dimensional metrics and scoring system.
- Supports A/B testing with control/treatment groups, randomization, and statistical significance analysis.
- Includes performance regression detection with historical comparison, trend analysis, and root cause exploration.
- Offers sandboxed simulation for scenario testing, boundary conditions, and fault injection.
- Includes clear installation and usage instructions with code samples.
Metadata
Frequently Asked Questions
What is Agent Eval Suite?
Provides benchmark testing, A/B testing, performance regression detection, and simulation environment testing for agent evaluation. It is an AI Agent Skill for Claude Code / OpenClaw, with 143 downloads so far.
How do I install Agent Eval Suite?
Run "/install yuyonghao-agent-eval-suite" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Agent Eval Suite free?
Yes, Agent Eval Suite is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Agent Eval Suite support?
Agent Eval Suite is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Agent Eval Suite?
It is built and maintained by yuyonghao-123 (@yuyonghao-123); the current version is v0.1.0.
More Skills