← 返回 Skills 市场
5372
总下载
8
收藏
60
当前安装
1
版本数
在 OpenClaw 中安装
/install agent-evaluation
功能描述
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
安全使用建议
Reasonable to install as an evaluation aid. Treat it as high-level guidance, not an automated evaluator, and avoid placing confidential benchmark, prompt, or production data into agent prompts unless your normal data-handling controls allow it.
功能分析
Type: OpenClaw Skill
Name: agent-evaluation
Version: 1.0.0
The skill bundle contains standard metadata and a markdown file describing 'Agent Evaluation'. The markdown content sets a persona for the AI agent and provides extensive information about evaluating LLM agents, including capabilities, requirements, patterns, and anti-patterns. There are no instructions for malicious execution, data exfiltration, persistence, or prompt injection aiming to subvert the agent's intended purpose or security boundaries. All content is descriptive and aligns with the stated goal of agent evaluation.
能力评估
Purpose & Capability
The artifact purpose and content align: it describes agent testing, benchmark design, capability assessment, reliability metrics, regression testing, and evaluation anti-patterns.
Instruction Scope
Instructions are advisory and domain-scoped; they do not ask the agent to override user intent, access private data, run commands, or perform high-impact actions.
Install Mechanism
The package contains a single non-executable SKILL.md file with metadata frontmatter and no scripts, binaries, dependencies, or install hooks.
Credentials
No environment variables, credentials, local files, network services, accounts, or external APIs are requested.
Persistence & Privilege
No persistence, privilege escalation, background execution, memory storage, credential handling, or session/profile use is described.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install agent-evaluation - 安装完成后,直接呼叫该 Skill 的名称或使用
/agent-evaluation触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of agent-evaluation skill for testing and benchmarking LLM agents.
- Supports behavioral testing, capability assessment, reliability metrics, and production monitoring.
- Includes practical testing patterns: statistical test evaluation, behavioral contract testing, and adversarial testing.
- Highlights common anti-patterns and sharp edges in LLM agent evaluation.
- Designed for use alongside related skills such as multi-agent orchestration and autonomous agents.
元数据
常见问题
Agent Evaluation 是什么?
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 5372 次。
如何安装 Agent Evaluation?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install agent-evaluation」即可一键安装,无需额外配置。
Agent Evaluation 是免费的吗?
是的,Agent Evaluation 完全免费(开源免费),可自由下载、安装和使用。
Agent Evaluation 支持哪些平台?
Agent Evaluation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Agent Evaluation?
由 rustyorb(@rustyorb)开发并维护,当前版本 v1.0.0。
推荐 Skills