← 返回 Skills 市场

EvalScope

Name: EvalScope
Author: yunnglin

作者 Yunlin Mao · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ 安全检测通过

166

总下载

当前安装

版本数

在 OpenClaw 中安装

/install skill-evalscope

功能描述

Translates natural language requests into evalscope CLI commands. Core capabilities: (1) Model accuracy evaluation (eval) — runs 156+ benchmarks (Math, Codin...

安全使用建议

This skill appears to do what it says (build evalscope CLI commands). Before installing or running commands: (1) Verify the evalscope package/source you will install (use a virtualenv or container) and prefer an official PyPI or GitHub release; (2) Be cautious when providing API keys or endpoint URLs—only supply credentials you trust and avoid posting keys to remote/untrusted services; (3) When running perf tests, ensure endpoints are intended targets (benchmarks can generate heavy traffic); (4) Use mock_llm or sandbox modes if you want to test without contacting external models; (5) Review outputs/ reports before sharing and do not expose sensitive logs. If you want a deeper review, provide the evalscope PyPI project URL or the package code so it can be inspected.

功能分析

Type: OpenClaw Skill Name: skill-evalscope Version: 1.0.1 The skill bundle provides a legitimate integration for the EvalScope LLM evaluation framework, allowing an agent to perform model benchmarking, performance stress testing, and result visualization via the `evalscope` CLI. While the skill involves high-risk capabilities such as executing shell commands, installing Python packages, and handling API keys, these actions are transparently documented and strictly aligned with the tool's stated purpose. No evidence of data exfiltration, malicious obfuscation, or harmful prompt injection was found across the documentation or command references (SKILL.md, eval-reference.md, perf-reference.md).

能力评估

✓ Purpose & Capability

The name and description match the instructions: the SKILL.md converts natural‑language requests into evalscope CLI commands for evaluation, perf, discovery, and visualization. There are no unrelated environment variables, binaries, or config paths declared.

ℹ Instruction Scope

Instructions stay within evaluation, performance, and visualization workflows. They direct the agent to run evalscope CLI commands, read/write output directories (./outputs), and optionally launch a local Gradio UI. The doc also contains examples showing use of API endpoints and API keys—so the agent may be instructed to send requests to network endpoints and to accept user-provided secrets for those endpoints.

ℹ Install Mechanism

The skill is instruction‑only (no install spec), but SKILL.md recommends installing evalscope via pip (pip install evalscope or extras). Installing an external PyPI package (and extras) can pull many dependencies; that is expected for a CLI tool but is a moderate operational risk if you don't trust the upstream package or want to avoid installing packages system‑wide.

✓ Credentials

The registry metadata requests no environment variables or credentials. The runtime instructions, however, include many optional flags that accept API URLs and API keys (e.g., --api-key, judge-model-args, wandb API keys). These are reasonable for a benchmarking tool but mean the agent or user may be prompted to provide secrets when evaluating remote/API‑served models.

✓ Persistence & Privilege

The skill is not always‑enabled and does not request persistent privileges. It does not instruct modifying other skills or global agent config. Running evalscope commands may create output directories and logs under ./outputs, which is normal.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install skill-evalscope
安装完成后，直接呼叫该 Skill 的名称或使用 /skill-evalscope 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

No code changes; skill description clarified for accuracy and scope. - Expanded the skill description to concisely enumerate core EvalScope capabilities: model evaluation, performance benchmarking, benchmark discovery, and results visualization. - Clarified trigger scenarios for when this skill should be invoked. - No changes to CLI guidance, workflows, or example commands.

v1.0.0

Initial release of EvalScope skill: Natural language to evalscope CLI for LLM evaluation and benchmarking. - Translates requests into evalscope CLI commands for model eval, perf, and result visualization. - Discovers and filters benchmarks using tags and detailed queries. - Supports local checkpoints, API endpoints, and mock pipelines. - Guides through setup, model selection, benchmark selection, and parameterization. - Summarizes and points to results and reports after runs.

元数据

Slug skill-evalscope

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

EvalScope 是什么？

Translates natural language requests into evalscope CLI commands. Core capabilities: (1) Model accuracy evaluation (eval) — runs 156+ benchmarks (Math, Codin... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 166 次。

如何安装 EvalScope？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install skill-evalscope」即可一键安装，无需额外配置。

EvalScope 是免费的吗？

是的，EvalScope 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

EvalScope 支持哪些平台？

EvalScope 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 EvalScope？

由 Yunlin Mao（@yunnglin）开发并维护，当前版本 v1.0.1。