← 返回 Skills 市场

Evaluation Benchmark

Name: Evaluation Benchmark
Author: sky-lv

作者 SKY-lv · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

113

总下载

当前安装

版本数

在 OpenClaw 中安装

/install skylv-evaluation-benchmark

功能描述

Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景：(1) 设计评估指标，(2) 构建测试集，(3) 执行评估测试，(4) 分析评估结果。

安全使用建议

This skill appears coherent and low-risk because it is instruction-only and requests no credentials or installs. Before using it, note that: (1) it provides high-level prompts/examples only — it won't actually run tests or produce artifacts by itself; (2) any test data or model outputs you feed into the agent may contain sensitive information, so avoid submitting secrets or proprietary datasets; and (3) because there is no source/homepage or code, verify results manually and treat outputs as advisory rather than authoritative.

能力评估

✓ Purpose & Capability

Name, description, and SKILL.md all describe evaluation/benchmark tasks; there are no unrelated environment variables, binaries, or installs requested that would be inconsistent with an evaluation helper.

✓ Instruction Scope

SKILL.md contains only high-level prompts/examples for designing metrics, building test sets, running evaluations, and analyzing results — it does not instruct the agent to read system files, exfiltrate data, or call external endpoints beyond normal conversational behavior.

✓ Install Mechanism

No install spec and no code files are provided (instruction-only), so nothing is written to disk or fetched during install; this is the lowest-risk pattern.

✓ Credentials

No environment variables, credentials, or config paths are required; requested privileges are proportionate to the stated purpose.

✓ Persistence & Privilege

always is false and the skill is user-invocable; it does not request persistent presence or modify other skills or system settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install skylv-evaluation-benchmark
安装完成后，直接呼叫该 Skill 的名称或使用 /skylv-evaluation-benchmark 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Auto-publish

元数据

Slug skylv-evaluation-benchmark

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Evaluation Benchmark 是什么？

Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景：(1) 设计评估指标，(2) 构建测试集，(3) 执行评估测试，(4) 分析评估结果。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 113 次。

如何安装 Evaluation Benchmark？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install skylv-evaluation-benchmark」即可一键安装，无需额外配置。

Evaluation Benchmark 是免费的吗？

是的，Evaluation Benchmark 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Evaluation Benchmark 支持哪些平台？

Evaluation Benchmark 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Evaluation Benchmark？

由 SKY-lv（@sky-lv）开发并维护，当前版本 v1.0.0。