← Back to Skills Marketplace

Evaluation Benchmark

Name: Evaluation Benchmark
Author: sky-lv

by SKY-lv · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

113

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install skylv-evaluation-benchmark

Description

Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景：(1) 设计评估指标，(2) 构建测试集，(3) 执行评估测试，(4) 分析评估结果。

Usage Guidance

This skill appears coherent and low-risk because it is instruction-only and requests no credentials or installs. Before using it, note that: (1) it provides high-level prompts/examples only — it won't actually run tests or produce artifacts by itself; (2) any test data or model outputs you feed into the agent may contain sensitive information, so avoid submitting secrets or proprietary datasets; and (3) because there is no source/homepage or code, verify results manually and treat outputs as advisory rather than authoritative.

Capability Assessment

✓ Purpose & Capability

Name, description, and SKILL.md all describe evaluation/benchmark tasks; there are no unrelated environment variables, binaries, or installs requested that would be inconsistent with an evaluation helper.

✓ Instruction Scope

SKILL.md contains only high-level prompts/examples for designing metrics, building test sets, running evaluations, and analyzing results — it does not instruct the agent to read system files, exfiltrate data, or call external endpoints beyond normal conversational behavior.

✓ Install Mechanism

No install spec and no code files are provided (instruction-only), so nothing is written to disk or fetched during install; this is the lowest-risk pattern.

✓ Credentials

No environment variables, credentials, or config paths are required; requested privileges are proportionate to the stated purpose.

✓ Persistence & Privilege

always is false and the skill is user-invocable; it does not request persistent presence or modify other skills or system settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install skylv-evaluation-benchmark
After installation, invoke the skill by name or use /skylv-evaluation-benchmark
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Auto-publish

Metadata

Slug skylv-evaluation-benchmark

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Evaluation Benchmark?

Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景：(1) 设计评估指标，(2) 构建测试集，(3) 执行评估测试，(4) 分析评估结果。 It is an AI Agent Skill for Claude Code / OpenClaw, with 113 downloads so far.

How do I install Evaluation Benchmark?

Run "/install skylv-evaluation-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Evaluation Benchmark free?

Yes, Evaluation Benchmark is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Evaluation Benchmark support?

Evaluation Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Evaluation Benchmark?

It is built and maintained by SKY-lv (@sky-lv); the current version is v1.0.0.

More Skills