← Back to Skills Marketplace
90
Downloads
0
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install skylv-agent-evaluator
Description
Evaluate AI agent behavior on accuracy, efficiency, clarity, safety, and helpfulness, providing scores, grades, and improvement suggestions.
Usage Guidance
This package appears to be a local, heuristic-based evaluator (reads a file and applies regex rules). Before installing or using it, note that the SKILL.md claims 'LLM-as-judge' and a different set of evaluation dimensions/weights than the code actually implements — ask the author to explain which implementation is authoritative. If you plan to use it: (1) run it on non-sensitive sample logs in a sandbox to confirm behavior; (2) verify which criteria and weightings are used by inspecting the code (CRITERIA in agent_evaluator.js); (3) if you expect LLM-based scoring, do not trust the current code as-is — it makes no external calls; (4) consider forking or adjusting the script if you need LLM judgement or different metrics. The tool does not request secrets or network access, so the direct security risk is low, but the documentation/implementation mismatch could lead to mistaken trust in its results.
Capability Analysis
Type: OpenClaw Skill
Name: skylv-agent-evaluator
Version: 1.0.2
The skill is a utility for evaluating AI agent logs based on predefined metrics like accuracy and safety. The core logic in `agent_evaluator.js` uses simple regex-based scoring and local file reading without any network calls, shell execution, or credential access. The instructions in `SKILL.md` and `README.md` are consistent with the tool's stated purpose and do not contain malicious prompt injections.
Capability Assessment
Purpose & Capability
The declared purpose (evaluate agent behavior across five dimensions) aligns with the included code, which implements a scoring engine. However the SKILL.md/README claim different dimension names and weights (SKILL.md: Accuracy, Efficiency, Safety, Coherence, Adaptability; README: Accuracy 25% etc.) while the code defines accuracy, efficiency, clarity, safety, helpfulness with different weights. This mismatch between documentation and implementation is misleading.
Instruction Scope
SKILL.md states 'Analysis: Score each dimension using LLM-as-judge', but agent_evaluator.js performs local regex/heuristic scoring with no LLM calls or external network activity. The runtime instructions imply behavior (LLM judgement) that the code does not perform — a substantive divergence in scope.
Install Mechanism
No install spec or external downloads; the skill is instruction-only with a bundled JS file. No packages are fetched and nothing is written to disk aside from reading user-supplied files, so installation risk is low.
Credentials
The skill requests no environment variables, credentials, or special config paths. The code reads only a user-supplied file path and uses no secrets or external services.
Persistence & Privilege
always is false and the skill does not modify other skills or system settings. It does not persist credentials or enable itself automatically, so there are no elevated persistence privileges.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install skylv-agent-evaluator - After installation, invoke the skill by name or use
/skylv-agent-evaluator - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
- Completely rewrote and reformatted SKILL.md for clarity and usability
- Updated evaluation criteria: changed from 5 named "criteria" to 5 "dimensions" (Accuracy, Efficiency, Safety, Coherence, Adaptability), adjusting weights and definitions
- Expanded output documentation: added sample evaluation report and actionable suggestions
- Added explicit use cases and quick start instructions
- Clarified evaluation process and trigger usage
- Switched to structured YAML frontmatter for metadata
v1.0.1
- No changes detected from the previous version.
- Version updated without any modifications to files or documentation.
v1.0.0
- Initial release of skylv-agent-evaluator.
- Evaluates AI agent actions based on 5 criteria: accuracy, efficiency, clarity, safety, and helpfulness.
- Provides a weighted score (0-100), letter grade, and improvement suggestions for low-performing areas.
- Designed for quick assessment of agent quality using trigger keywords like "evaluate," "score," and "behavior check."
- Competes with "eval" in the agent evaluation market.
Metadata
Frequently Asked Questions
What is Skylv Agent Evaluator?
Evaluate AI agent behavior on accuracy, efficiency, clarity, safety, and helpfulness, providing scores, grades, and improvement suggestions. It is an AI Agent Skill for Claude Code / OpenClaw, with 90 downloads so far.
How do I install Skylv Agent Evaluator?
Run "/install skylv-agent-evaluator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Skylv Agent Evaluator free?
Yes, Skylv Agent Evaluator is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Skylv Agent Evaluator support?
Skylv Agent Evaluator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Skylv Agent Evaluator?
It is built and maintained by SKY-lv (@sky-lv); the current version is v1.0.2.
More Skills