← Back to Skills Marketplace

Ml Model Eval Benchmark

Name: Ml Model Eval Benchmark
Author: 0x-professor

by Muhammad Mazhar Saeed · GitHub ↗ · v0.1.0

cross-platform ✓ Security Clean

420

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install ml-model-eval-benchmark

Description

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Usage Guidance

This skill appears low-risk and does what it says: run the bundled script with a JSON input to produce a leaderboard. Before installing/using it: (1) review or run the script locally on non-sensitive sample data to confirm behavior; (2) ensure the input JSON and requested output path are trusted (the script will create parent directories and may overwrite the specified output file); (3) note there are no network calls or credential accesses, so it won't exfiltrate data, but it also does minimal validation of metric values and tie-break behavior — verify the weighting/tie-break rules meet your policy for model promotion decisions.

Capability Analysis

Type: OpenClaw Skill Name: ml-model-eval-benchmark Version: 0.1.0 The skill bundle is designed to benchmark ML models and appears benign. The `SKILL.md` provides standard instructions to execute the `scripts/benchmark_models.py` script and read documentation, with no evidence of prompt injection attempts. The Python script performs local file I/O (reading JSON input, writing JSON/MD/CSV output) as expected for its function, without network calls, arbitrary command execution, or attempts to access sensitive data. While potential vulnerabilities like path traversal or resource exhaustion exist due to handling user-specified file paths and input sizes, these are common in command-line utilities and do not demonstrate malicious intent.

Capability Assessment

✓ Purpose & Capability

Name and description match the included files: SKILL.md, a benchmarking guide, and a Python script that computes weighted scores and rankings. Nothing in the bundle requests unrelated capabilities or credentials.

✓ Instruction Scope

Runtime instructions instruct the agent to run the bundled script and consult the guide. The script only reads a user-supplied JSON input (size-limited), computes scores, and writes an output artifact. The instructions do not ask the agent to read other system files, environment variables, or transmit data externally.

✓ Install Mechanism

No install spec is provided (instruction-only with a bundled script). No downloads, package installs, or external package registry usage are present.

✓ Credentials

The skill declares no environment variables, credentials, or config paths. The script operates solely on an explicit input file and an explicit output path; there are no hidden secret requirements.

✓ Persistence & Privilege

always is false and the skill does not request persistent system presence or modify other skills. The script writes only to the user-specified output path and creates parent directories as needed.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install ml-model-eval-benchmark
After installation, invoke the skill by name or use /ml-model-eval-benchmark
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.1.0

- Initial release of ml-model-eval-benchmark. - Supports weighted metric evaluation and deterministic model ranking. - Enables benchmark leaderboard generation and model promotion decisions. - Includes scripts and guides for consistent evaluation workflows. - Enforces standardized metric names, scales, and explicit weighting documentation.

Metadata

Slug ml-model-eval-benchmark

Version 0.1.0

License —

All-time Installs 3

Active Installs 2

Total Versions 1

Frequently Asked Questions

What is Ml Model Eval Benchmark?

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions. It is an AI Agent Skill for Claude Code / OpenClaw, with 420 downloads so far.

How do I install Ml Model Eval Benchmark?

Run "/install ml-model-eval-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ml Model Eval Benchmark free?

Yes, Ml Model Eval Benchmark is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Ml Model Eval Benchmark support?

Ml Model Eval Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ml Model Eval Benchmark?

It is built and maintained by Muhammad Mazhar Saeed (@0x-professor); the current version is v0.1.0.

More Skills