← 返回 Skills 市场

Skylv Agent Evaluator

Name: Skylv Agent Evaluator
Author: sky-lv

作者 SKY-lv · GitHub ↗ · v1.0.2 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install skylv-agent-evaluator

功能描述

Evaluate AI agent behavior on accuracy, efficiency, clarity, safety, and helpfulness, providing scores, grades, and improvement suggestions.

安全使用建议

This package appears to be a local, heuristic-based evaluator (reads a file and applies regex rules). Before installing or using it, note that the SKILL.md claims 'LLM-as-judge' and a different set of evaluation dimensions/weights than the code actually implements — ask the author to explain which implementation is authoritative. If you plan to use it: (1) run it on non-sensitive sample logs in a sandbox to confirm behavior; (2) verify which criteria and weightings are used by inspecting the code (CRITERIA in agent_evaluator.js); (3) if you expect LLM-based scoring, do not trust the current code as-is — it makes no external calls; (4) consider forking or adjusting the script if you need LLM judgement or different metrics. The tool does not request secrets or network access, so the direct security risk is low, but the documentation/implementation mismatch could lead to mistaken trust in its results.

功能分析

Type: OpenClaw Skill Name: skylv-agent-evaluator Version: 1.0.2 The skill is a utility for evaluating AI agent logs based on predefined metrics like accuracy and safety. The core logic in `agent_evaluator.js` uses simple regex-based scoring and local file reading without any network calls, shell execution, or credential access. The instructions in `SKILL.md` and `README.md` are consistent with the tool's stated purpose and do not contain malicious prompt injections.

能力评估

⚠ Purpose & Capability

The declared purpose (evaluate agent behavior across five dimensions) aligns with the included code, which implements a scoring engine. However the SKILL.md/README claim different dimension names and weights (SKILL.md: Accuracy, Efficiency, Safety, Coherence, Adaptability; README: Accuracy 25% etc.) while the code defines accuracy, efficiency, clarity, safety, helpfulness with different weights. This mismatch between documentation and implementation is misleading.

⚠ Instruction Scope

SKILL.md states 'Analysis: Score each dimension using LLM-as-judge', but agent_evaluator.js performs local regex/heuristic scoring with no LLM calls or external network activity. The runtime instructions imply behavior (LLM judgement) that the code does not perform — a substantive divergence in scope.

✓ Install Mechanism

No install spec or external downloads; the skill is instruction-only with a bundled JS file. No packages are fetched and nothing is written to disk aside from reading user-supplied files, so installation risk is low.

✓ Credentials

The skill requests no environment variables, credentials, or special config paths. The code reads only a user-supplied file path and uses no secrets or external services.

✓ Persistence & Privilege

always is false and the skill does not modify other skills or system settings. It does not persist credentials or enable itself automatically, so there are no elevated persistence privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install skylv-agent-evaluator
安装完成后，直接呼叫该 Skill 的名称或使用 /skylv-agent-evaluator 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.2

- Completely rewrote and reformatted SKILL.md for clarity and usability - Updated evaluation criteria: changed from 5 named "criteria" to 5 "dimensions" (Accuracy, Efficiency, Safety, Coherence, Adaptability), adjusting weights and definitions - Expanded output documentation: added sample evaluation report and actionable suggestions - Added explicit use cases and quick start instructions - Clarified evaluation process and trigger usage - Switched to structured YAML frontmatter for metadata

v1.0.1

- No changes detected from the previous version. - Version updated without any modifications to files or documentation.

v1.0.0

- Initial release of skylv-agent-evaluator. - Evaluates AI agent actions based on 5 criteria: accuracy, efficiency, clarity, safety, and helpfulness. - Provides a weighted score (0-100), letter grade, and improvement suggestions for low-performing areas. - Designed for quick assessment of agent quality using trigger keywords like "evaluate," "score," and "behavior check." - Competes with "eval" in the agent evaluation market.

元数据

Slug skylv-agent-evaluator

版本 1.0.2

许可证 MIT-0

累计安装 1

当前安装数 0

历史版本数 3

常见问题

Skylv Agent Evaluator 是什么？

Evaluate AI agent behavior on accuracy, efficiency, clarity, safety, and helpfulness, providing scores, grades, and improvement suggestions. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 90 次。

如何安装 Skylv Agent Evaluator？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install skylv-agent-evaluator」即可一键安装，无需额外配置。

Skylv Agent Evaluator 是免费的吗？

是的，Skylv Agent Evaluator 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Skylv Agent Evaluator 支持哪些平台？

Skylv Agent Evaluator 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Skylv Agent Evaluator？

由 SKY-lv（@sky-lv）开发并维护，当前版本 v1.0.2。