← 返回 Skills 市场

Ml Model Eval Benchmark

Name: Ml Model Eval Benchmark
Author: 0x-professor

作者 Muhammad Mazhar Saeed · GitHub ↗ · v0.1.0

cross-platform ✓ 安全检测通过

420

总下载

当前安装

版本数

在 OpenClaw 中安装

/install ml-model-eval-benchmark

功能描述

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

安全使用建议

This skill appears low-risk and does what it says: run the bundled script with a JSON input to produce a leaderboard. Before installing/using it: (1) review or run the script locally on non-sensitive sample data to confirm behavior; (2) ensure the input JSON and requested output path are trusted (the script will create parent directories and may overwrite the specified output file); (3) note there are no network calls or credential accesses, so it won't exfiltrate data, but it also does minimal validation of metric values and tie-break behavior — verify the weighting/tie-break rules meet your policy for model promotion decisions.

功能分析

Type: OpenClaw Skill Name: ml-model-eval-benchmark Version: 0.1.0 The skill bundle is designed to benchmark ML models and appears benign. The `SKILL.md` provides standard instructions to execute the `scripts/benchmark_models.py` script and read documentation, with no evidence of prompt injection attempts. The Python script performs local file I/O (reading JSON input, writing JSON/MD/CSV output) as expected for its function, without network calls, arbitrary command execution, or attempts to access sensitive data. While potential vulnerabilities like path traversal or resource exhaustion exist due to handling user-specified file paths and input sizes, these are common in command-line utilities and do not demonstrate malicious intent.

能力评估

✓ Purpose & Capability

Name and description match the included files: SKILL.md, a benchmarking guide, and a Python script that computes weighted scores and rankings. Nothing in the bundle requests unrelated capabilities or credentials.

✓ Instruction Scope

Runtime instructions instruct the agent to run the bundled script and consult the guide. The script only reads a user-supplied JSON input (size-limited), computes scores, and writes an output artifact. The instructions do not ask the agent to read other system files, environment variables, or transmit data externally.

✓ Install Mechanism

No install spec is provided (instruction-only with a bundled script). No downloads, package installs, or external package registry usage are present.

✓ Credentials

The skill declares no environment variables, credentials, or config paths. The script operates solely on an explicit input file and an explicit output path; there are no hidden secret requirements.

✓ Persistence & Privilege

always is false and the skill does not request persistent system presence or modify other skills. The script writes only to the user-specified output path and creates parent directories as needed.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install ml-model-eval-benchmark
安装完成后，直接呼叫该 Skill 的名称或使用 /ml-model-eval-benchmark 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.0

- Initial release of ml-model-eval-benchmark. - Supports weighted metric evaluation and deterministic model ranking. - Enables benchmark leaderboard generation and model promotion decisions. - Includes scripts and guides for consistent evaluation workflows. - Enforces standardized metric names, scales, and explicit weighting documentation.

元数据

Slug ml-model-eval-benchmark

版本 0.1.0

许可证 —

累计安装 3

当前安装数 2

历史版本数 1

常见问题

Ml Model Eval Benchmark 是什么？

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 420 次。

如何安装 Ml Model Eval Benchmark？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ml-model-eval-benchmark」即可一键安装，无需额外配置。

Ml Model Eval Benchmark 是免费的吗？

是的，Ml Model Eval Benchmark 完全免费（开源免费），可自由下载、安装和使用。

Ml Model Eval Benchmark 支持哪些平台？

Ml Model Eval Benchmark 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Ml Model Eval Benchmark？

由 Muhammad Mazhar Saeed（@0x-professor）开发并维护，当前版本 v0.1.0。