← 返回 Skills 市场
113
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install skylv-evaluation-benchmark
功能描述
Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景:(1) 设计评估指标,(2) 构建测试集,(3) 执行评估测试,(4) 分析评估结果。
安全使用建议
This skill appears coherent and low-risk because it is instruction-only and requests no credentials or installs. Before using it, note that: (1) it provides high-level prompts/examples only — it won't actually run tests or produce artifacts by itself; (2) any test data or model outputs you feed into the agent may contain sensitive information, so avoid submitting secrets or proprietary datasets; and (3) because there is no source/homepage or code, verify results manually and treat outputs as advisory rather than authoritative.
能力评估
Purpose & Capability
Name, description, and SKILL.md all describe evaluation/benchmark tasks; there are no unrelated environment variables, binaries, or installs requested that would be inconsistent with an evaluation helper.
Instruction Scope
SKILL.md contains only high-level prompts/examples for designing metrics, building test sets, running evaluations, and analyzing results — it does not instruct the agent to read system files, exfiltrate data, or call external endpoints beyond normal conversational behavior.
Install Mechanism
No install spec and no code files are provided (instruction-only), so nothing is written to disk or fetched during install; this is the lowest-risk pattern.
Credentials
No environment variables, credentials, or config paths are required; requested privileges are proportionate to the stated purpose.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request persistent presence or modify other skills or system settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install skylv-evaluation-benchmark - 安装完成后,直接呼叫该 Skill 的名称或使用
/skylv-evaluation-benchmark触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Auto-publish
元数据
常见问题
Evaluation Benchmark 是什么?
Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景:(1) 设计评估指标,(2) 构建测试集,(3) 执行评估测试,(4) 分析评估结果。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 113 次。
如何安装 Evaluation Benchmark?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install skylv-evaluation-benchmark」即可一键安装,无需额外配置。
Evaluation Benchmark 是免费的吗?
是的,Evaluation Benchmark 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Evaluation Benchmark 支持哪些平台?
Evaluation Benchmark 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Evaluation Benchmark?
由 SKY-lv(@sky-lv)开发并维护,当前版本 v1.0.0。
推荐 Skills