← Back to Skills Marketplace
nandorocker

Model Tester

by Nando Rossi · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
376
Downloads
0
Stars
3
Active Installs
1
Versions
Install in OpenClaw
/install model-tester
Description
Test agents or models against predefined test cases to validate model routing, performance, and output quality. Use when: (1) verifying a specific agent or m...
Usage Guidance
Before installing or running this skill: (1) ensure the 'openclaw' CLI is installed and accessible (the skill did not declare this dependency in metadata), (2) run it in a safe/sandboxed environment because it tails OpenClaw logs which may contain unrelated or sensitive information from your deployment, (3) verify your OpenClaw config/gateway credentials are appropriate for testing (the script will use whatever local config the CLI has), (4) review and, if desired, edit references/test-cases.json so test prompts contain no sensitive data, and (5) consider running a single case with verbose output to confirm the tool only parses the expected model/token fields before using it on broader logs or CI. If you need higher assurance, ask the skill author to (a) declare 'openclaw' as a required binary in metadata, (b) add an option to limit log scope/time window, and (c) avoid reading unrelated log lines or optionally write raw logs only to a user-specified local file for manual review.
Capability Analysis
Type: OpenClaw Skill Name: model-tester Version: 1.0.0 The model-tester skill is a legitimate benchmarking and diagnostic tool designed to verify OpenClaw agent routing and performance. It uses the 'openclaw' CLI to execute predefined test cases from 'references/test-cases.json' and monitors system logs via 'openclaw logs' to extract model usage and token statistics. The Python script 'scripts/model_tester.py' uses standard subprocess handling without shell injection risks, and no evidence of data exfiltration, persistence, or malicious prompt injection was found.
Capability Assessment
Purpose & Capability
The skill's stated purpose (testing agents/models) matches the included code: scripts/model_tester.py runs predefined prompts and checks routing via OpenClaw logs. However, the SKILL metadata declares no required binaries while the code clearly requires the 'openclaw' CLI (used for both 'openclaw logs --follow --json' and 'openclaw agent ...'). This undeclared dependency is an incoherence and should be fixed/verified before install. The code also implicitly requires that the user has a valid OpenClaw configuration (gateway/credentials) available to the 'openclaw' binary.
Instruction Scope
The runtime instructions and the script explicitly tail OpenClaw logs and run 'openclaw agent' subprocesses. The SKILL.md asserts only structured fields are captured and no user data is sent to models, which the script mostly enforces by using fixed test prompts. However, tailing logs with '--follow' collects arbitrary log lines from the OpenClaw runtime and the script inspects those lines with regexes — that can inadvertently match or expose other log content. The tool does not transmit logs externally, but it reads them and includes parsed tokens/model fields in output; if logs contain unexpected sensitive fields, parsing may capture them. The instruction text is otherwise scoped to the testing task and does not ask for additional unrelated files or env vars.
Install Mechanism
There is no install spec (instruction-only plus a script file). That is low-risk in that nothing is downloaded or executed at install time, but the packaged script will execute subprocesses at runtime. No external archives or network installers are used.
Credentials
The skill declares no required environment variables or credentials, which is reasonable. However, it relies on the local 'openclaw' CLI and therefore implicitly on whatever credentials/config the user's OpenClaw installation uses (gateway keys, local config). That implicit access is proportional to the tool's purpose but should be understood by the user: running this script will cause the agent/CLI to execute and may read the user's OpenClaw config.
Persistence & Privilege
The skill does not request persistent presence (always:false) and does not modify other skills or system settings. It runs as a normal, user-invoked tool and does not autonomously enable itself or persist new credentials.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install model-tester
  3. After installation, invoke the skill by name or use /model-tester
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of model-tester. - Provides a command-line tool to validate agents or models against predefined test cases. - Supports testing model routing, performance, and output quality with structured JSON reporting. - Allows targeting specific agents or models using `--agent`, `--model`, and `--case` parameters. - Extracts actual model usage, token counts, and runtime from OpenClaw logs for verification. - Ensures privacy by using only static prompts and structured log fields—no user data involved.
Metadata
Slug model-tester
Version 1.0.0
License MIT-0
All-time Installs 3
Active Installs 3
Total Versions 1
Frequently Asked Questions

What is Model Tester?

Test agents or models against predefined test cases to validate model routing, performance, and output quality. Use when: (1) verifying a specific agent or m... It is an AI Agent Skill for Claude Code / OpenClaw, with 376 downloads so far.

How do I install Model Tester?

Run "/install model-tester" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Model Tester free?

Yes, Model Tester is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Model Tester support?

Model Tester is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Model Tester?

It is built and maintained by Nando Rossi (@nandorocker); the current version is v1.0.0.

💬 Comments