← Back to Skills Marketplace
113
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install skylv-evaluation-benchmark
Description
Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景:(1) 设计评估指标,(2) 构建测试集,(3) 执行评估测试,(4) 分析评估结果。
Usage Guidance
This skill appears coherent and low-risk because it is instruction-only and requests no credentials or installs. Before using it, note that: (1) it provides high-level prompts/examples only — it won't actually run tests or produce artifacts by itself; (2) any test data or model outputs you feed into the agent may contain sensitive information, so avoid submitting secrets or proprietary datasets; and (3) because there is no source/homepage or code, verify results manually and treat outputs as advisory rather than authoritative.
Capability Assessment
Purpose & Capability
Name, description, and SKILL.md all describe evaluation/benchmark tasks; there are no unrelated environment variables, binaries, or installs requested that would be inconsistent with an evaluation helper.
Instruction Scope
SKILL.md contains only high-level prompts/examples for designing metrics, building test sets, running evaluations, and analyzing results — it does not instruct the agent to read system files, exfiltrate data, or call external endpoints beyond normal conversational behavior.
Install Mechanism
No install spec and no code files are provided (instruction-only), so nothing is written to disk or fetched during install; this is the lowest-risk pattern.
Credentials
No environment variables, credentials, or config paths are required; requested privileges are proportionate to the stated purpose.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request persistent presence or modify other skills or system settings.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install skylv-evaluation-benchmark - After installation, invoke the skill by name or use
/skylv-evaluation-benchmark - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Auto-publish
Metadata
Frequently Asked Questions
What is Evaluation Benchmark?
Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景:(1) 设计评估指标,(2) 构建测试集,(3) 执行评估测试,(4) 分析评估结果。 It is an AI Agent Skill for Claude Code / OpenClaw, with 113 downloads so far.
How do I install Evaluation Benchmark?
Run "/install skylv-evaluation-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Evaluation Benchmark free?
Yes, Evaluation Benchmark is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Evaluation Benchmark support?
Evaluation Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Evaluation Benchmark?
It is built and maintained by SKY-lv (@sky-lv); the current version is v1.0.0.
More Skills