← 返回 Skills 市场
Agent Eval Suite
作者
yuyonghao-123
· GitHub ↗
· v0.1.0
· MIT-0
143
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install yuyonghao-agent-eval-suite
功能描述
Provides benchmark testing, A/B testing, performance regression detection, and simulation environment testing for agent evaluation.
安全使用建议
This package appears to implement the evaluation features it claims, but review before use: 1) The Simulator.loadScenario will read files from disk and accepts absolute paths — do not pass untrusted scenario names (or run it as a privileged user) because it can read arbitrary JSON files from the host. 2) Chaos/fault injection can allocate memory and sleep for long periods; running large simulations or untrusted scenarios could exhaust resources or cause timeouts. 3) Because there is no remote network activity in the code, risk is local-file and resource exposure; run the tool in a sandbox/container, inspect the simulator's scenario loading and fault-injection code, and avoid supplying scenario names or files from untrusted sources. If you want higher assurance, ask the author to sanitize scenario path handling (disallow absolute paths or restrict to a safe fixtures directory) and to make fault-injection limits configurable and documented.
能力评估
Purpose & Capability
Name/description match the provided code: Benchmark, ABTester, RegressionDetector, Simulator, and ReportGenerator are present and implement the advertised testing and analysis features.
Instruction Scope
SKILL.md usage examples are limited and appropriate, but the Simulator.loadScenario implementation reads JSON from the filesystem using fs.readFileSync and accepts absolute paths (path.isAbsolute(scenarioName) ? scenarioName : ...). That lets the skill read arbitrary files if given an absolute path, which is not documented in SKILL.md and goes beyond the advertised sandbox/simulation behavior. The Simulator also offers fault injection that can allocate memory (Buffer.alloc(10MB)) and long sleeps, which could be used to consume host resources.
Install Mechanism
This is an instruction-only skill with no declared install spec in the registry. SKILL.md suggests running npm install locally; package.json is included and there are no remote download URLs or install hooks—no high-risk install mechanism detected.
Credentials
The skill does not request environment variables, credentials, or config paths. No code reads environment variables or external secrets. Requested access is proportional to the described functionality.
Persistence & Privilege
Skill is not always-enabled and does not request persistent privileges or modify other skills or agent-wide configs. No autonomous invocation flag escalation beyond platform defaults.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install yuyonghao-agent-eval-suite - 安装完成后,直接呼叫该 Skill 的名称或使用
/yuyonghao-agent-eval-suite触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial release of Agent Eval Suite:
- Provides a standardized benchmarking framework with multi-dimensional metrics and scoring system.
- Supports A/B testing with control/treatment groups, randomization, and statistical significance analysis.
- Includes performance regression detection with historical comparison, trend analysis, and root cause exploration.
- Offers sandboxed simulation for scenario testing, boundary conditions, and fault injection.
- Includes clear installation and usage instructions with code samples.
元数据
常见问题
Agent Eval Suite 是什么?
Provides benchmark testing, A/B testing, performance regression detection, and simulation environment testing for agent evaluation. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 143 次。
如何安装 Agent Eval Suite?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install yuyonghao-agent-eval-suite」即可一键安装,无需额外配置。
Agent Eval Suite 是免费的吗?
是的,Agent Eval Suite 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Agent Eval Suite 支持哪些平台?
Agent Eval Suite 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Agent Eval Suite?
由 yuyonghao-123(@yuyonghao-123)开发并维护,当前版本 v0.1.0。
推荐 Skills