← Back to Skills Marketplace
0x-professor

Ml Model Eval Benchmark

cross-platform ✓ Security Clean
420
Downloads
0
Stars
2
Active Installs
1
Versions
Install in OpenClaw
/install ml-model-eval-benchmark
Description
Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
Usage Guidance
This skill appears low-risk and does what it says: run the bundled script with a JSON input to produce a leaderboard. Before installing/using it: (1) review or run the script locally on non-sensitive sample data to confirm behavior; (2) ensure the input JSON and requested output path are trusted (the script will create parent directories and may overwrite the specified output file); (3) note there are no network calls or credential accesses, so it won't exfiltrate data, but it also does minimal validation of metric values and tie-break behavior — verify the weighting/tie-break rules meet your policy for model promotion decisions.
Capability Analysis
Type: OpenClaw Skill Name: ml-model-eval-benchmark Version: 0.1.0 The skill bundle is designed to benchmark ML models and appears benign. The `SKILL.md` provides standard instructions to execute the `scripts/benchmark_models.py` script and read documentation, with no evidence of prompt injection attempts. The Python script performs local file I/O (reading JSON input, writing JSON/MD/CSV output) as expected for its function, without network calls, arbitrary command execution, or attempts to access sensitive data. While potential vulnerabilities like path traversal or resource exhaustion exist due to handling user-specified file paths and input sizes, these are common in command-line utilities and do not demonstrate malicious intent.
Capability Assessment
Purpose & Capability
Name and description match the included files: SKILL.md, a benchmarking guide, and a Python script that computes weighted scores and rankings. Nothing in the bundle requests unrelated capabilities or credentials.
Instruction Scope
Runtime instructions instruct the agent to run the bundled script and consult the guide. The script only reads a user-supplied JSON input (size-limited), computes scores, and writes an output artifact. The instructions do not ask the agent to read other system files, environment variables, or transmit data externally.
Install Mechanism
No install spec is provided (instruction-only with a bundled script). No downloads, package installs, or external package registry usage are present.
Credentials
The skill declares no environment variables, credentials, or config paths. The script operates solely on an explicit input file and an explicit output path; there are no hidden secret requirements.
Persistence & Privilege
always is false and the skill does not request persistent system presence or modify other skills. The script writes only to the user-specified output path and creates parent directories as needed.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ml-model-eval-benchmark
  3. After installation, invoke the skill by name or use /ml-model-eval-benchmark
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
- Initial release of ml-model-eval-benchmark. - Supports weighted metric evaluation and deterministic model ranking. - Enables benchmark leaderboard generation and model promotion decisions. - Includes scripts and guides for consistent evaluation workflows. - Enforces standardized metric names, scales, and explicit weighting documentation.
Metadata
Slug ml-model-eval-benchmark
Version 0.1.0
License
All-time Installs 3
Active Installs 2
Total Versions 1
Frequently Asked Questions

What is Ml Model Eval Benchmark?

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions. It is an AI Agent Skill for Claude Code / OpenClaw, with 420 downloads so far.

How do I install Ml Model Eval Benchmark?

Run "/install ml-model-eval-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ml Model Eval Benchmark free?

Yes, Ml Model Eval Benchmark is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Ml Model Eval Benchmark support?

Ml Model Eval Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ml Model Eval Benchmark?

It is built and maintained by Muhammad Mazhar Saeed (@0x-professor); the current version is v0.1.0.

💬 Comments