← Back to Skills Marketplace

Skylv Agent Evaluator

Name: Skylv Agent Evaluator
Author: sky-lv

by SKY-lv · GitHub ↗ · v1.0.2 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install skylv-agent-evaluator

Description

Evaluate AI agent behavior on accuracy, efficiency, clarity, safety, and helpfulness, providing scores, grades, and improvement suggestions.

Usage Guidance

This package appears to be a local, heuristic-based evaluator (reads a file and applies regex rules). Before installing or using it, note that the SKILL.md claims 'LLM-as-judge' and a different set of evaluation dimensions/weights than the code actually implements — ask the author to explain which implementation is authoritative. If you plan to use it: (1) run it on non-sensitive sample logs in a sandbox to confirm behavior; (2) verify which criteria and weightings are used by inspecting the code (CRITERIA in agent_evaluator.js); (3) if you expect LLM-based scoring, do not trust the current code as-is — it makes no external calls; (4) consider forking or adjusting the script if you need LLM judgement or different metrics. The tool does not request secrets or network access, so the direct security risk is low, but the documentation/implementation mismatch could lead to mistaken trust in its results.

Capability Analysis

Type: OpenClaw Skill Name: skylv-agent-evaluator Version: 1.0.2 The skill is a utility for evaluating AI agent logs based on predefined metrics like accuracy and safety. The core logic in `agent_evaluator.js` uses simple regex-based scoring and local file reading without any network calls, shell execution, or credential access. The instructions in `SKILL.md` and `README.md` are consistent with the tool's stated purpose and do not contain malicious prompt injections.

Capability Assessment

⚠ Purpose & Capability

The declared purpose (evaluate agent behavior across five dimensions) aligns with the included code, which implements a scoring engine. However the SKILL.md/README claim different dimension names and weights (SKILL.md: Accuracy, Efficiency, Safety, Coherence, Adaptability; README: Accuracy 25% etc.) while the code defines accuracy, efficiency, clarity, safety, helpfulness with different weights. This mismatch between documentation and implementation is misleading.

⚠ Instruction Scope

SKILL.md states 'Analysis: Score each dimension using LLM-as-judge', but agent_evaluator.js performs local regex/heuristic scoring with no LLM calls or external network activity. The runtime instructions imply behavior (LLM judgement) that the code does not perform — a substantive divergence in scope.

✓ Install Mechanism

No install spec or external downloads; the skill is instruction-only with a bundled JS file. No packages are fetched and nothing is written to disk aside from reading user-supplied files, so installation risk is low.

✓ Credentials

The skill requests no environment variables, credentials, or special config paths. The code reads only a user-supplied file path and uses no secrets or external services.

✓ Persistence & Privilege

always is false and the skill does not modify other skills or system settings. It does not persist credentials or enable itself automatically, so there are no elevated persistence privileges.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install skylv-agent-evaluator
After installation, invoke the skill by name or use /skylv-agent-evaluator
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.2

- Completely rewrote and reformatted SKILL.md for clarity and usability - Updated evaluation criteria: changed from 5 named "criteria" to 5 "dimensions" (Accuracy, Efficiency, Safety, Coherence, Adaptability), adjusting weights and definitions - Expanded output documentation: added sample evaluation report and actionable suggestions - Added explicit use cases and quick start instructions - Clarified evaluation process and trigger usage - Switched to structured YAML frontmatter for metadata

v1.0.1

- No changes detected from the previous version. - Version updated without any modifications to files or documentation.

v1.0.0

- Initial release of skylv-agent-evaluator. - Evaluates AI agent actions based on 5 criteria: accuracy, efficiency, clarity, safety, and helpfulness. - Provides a weighted score (0-100), letter grade, and improvement suggestions for low-performing areas. - Designed for quick assessment of agent quality using trigger keywords like "evaluate," "score," and "behavior check." - Competes with "eval" in the agent evaluation market.

Metadata

Slug skylv-agent-evaluator

Version 1.0.2

License MIT-0

All-time Installs 1

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Skylv Agent Evaluator?

Evaluate AI agent behavior on accuracy, efficiency, clarity, safety, and helpfulness, providing scores, grades, and improvement suggestions. It is an AI Agent Skill for Claude Code / OpenClaw, with 90 downloads so far.

How do I install Skylv Agent Evaluator?

Run "/install skylv-agent-evaluator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Skylv Agent Evaluator free?

Yes, Skylv Agent Evaluator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Skylv Agent Evaluator support?

Skylv Agent Evaluator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Skylv Agent Evaluator?

It is built and maintained by SKY-lv (@sky-lv); the current version is v1.0.2.

More Skills