← 返回 Skills 市场

Multi-Agent Skill Evaluator

Name: Multi-Agent Skill Evaluator
Author: 54lynnn

作者 54Lynnn · GitHub ↗ · v1.4.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install multi-agent-skill-evaluator

功能描述

帮我评估一下这个 skill。

使用说明 (SKILL.md)

Skill Evaluator — 多智能体技能评估

对目标 skill 进行结构化多维度评估。用 3 个隔离的子 agent 作为独立考官，各自全面评估后汇总结果。

工作流

Step 1：读取目标 skill

读取目标 skill 目录下的全部文件，跳过二进制和非文本文件：

SKILL.md
scripts/*（.sh, .py, .js 等）
references/*（.md 等）
其他文本配置文件

Step 2：并行启动 3 个子 agent

使用 references/evaluation-protocol.md 中的评估协议，填充评估技能信息和全部文件内容后，同时 spawn 3 个子 agent（使用 mode="run"）。

每个子 agent 的 task 内容必须包含：

角色声明（你是独立考官 A/B/C）
评估技能信息
全部评估材料（完整文件内容）
评估标准（8个维度定义，直接从 evaluation-protocol.md 引用）
输出格式要求（含 ===SCORE_SUMMARY=== 标记行）

注意：使用 sessions_spawn 并行发送，不要串行等待。然后 sessions_yield 等待全部完成。

Step 2.5（可选）：分歧追问

聚合分数时如果某个维度最高分 - 最低分 ≥ 3， spawn 一个追问子 agent 专门分析：

你是 Skill Evaluator 的追问考官。关于技能 xxx 的"安全性"维度：
考官 A（9分）理由：...
考官 B（4分）理由：...

请分析双方分歧：谁的论据更强？是否存在双方都没发现的盲点？

将追问结果加入最终报告。

Step 3：聚合结果

从每个子 agent 的输出中提取分数摘要（解析 ===SCORE_SUMMARY=== 标记段）和详细评语。

若某个子 agent 未完成或输出格式异常，标记为 N/A 并在报告中注明。

汇总输出（严格按以下结构）：

══════════════════════════════════════
  Skill 评估报告：\x3Cskill名称> v\x3C版本>
══════════════════════════════════════

📊 各维度评分
┌────────────────────┬────┬────┬────┬──────┐
│ 维度               │ A  │ B  │ C  │ 均分 │
├────────────────────┼────┼────┼────┼──────┤
│ 1. 功能完整性      │    │    │    │      │
│ 2. 代码质量        │    │    │    │      │
│ 3. 健壮性          │    │    │    │      │
│ 4. 安全性          │    │    │    │      │
│ 5. 文档质量        │    │    │    │      │
│ 6. 依赖合理性      │    │    │    │      │
│ 7. 预估运行效果    │    │    │    │      │
│ 8. 总评            │    │    │    │      │
└────────────────────┴────┴────┴────┴──────┘

注：维度均分 = (A+B+C)/3，保留一位小数

🔍 主要分歧点

列出最高分-最低分 ≥ 3 的维度（如有），附各方论据和分析。

✅ 共识优势

至少 2 个考官均明确提及的优点（引用原文关键词）

⚠️ 共识问题

至少 2 个考官均明确指出的问题（引用原文关键词）

📝 综合评语

- 整体质量定位
- 最值得改进的 1-2 个点
- 建议评级：推荐 / 可用但有坑 / 不推荐

安全使用建议

Install this only if you want a Chinese-language, multi-agent evaluator for other OpenClaw skills. Avoid using it on directories containing private notes, secrets, or unrelated project files, because it is designed to read and pass the target skill’s text files to evaluator sub-agents.

能力评估

✓ Purpose & Capability

The stated purpose is to evaluate a target skill, and the requested capabilities fit that purpose: read the target skill files, send them to three isolated evaluator agents, optionally ask a follow-up agent about large scoring disagreements, and aggregate a report.

ℹ Instruction Scope

The Chinese description is broad and conversational, which could cause accidental activation for users asking generally about skill evaluation; the body still clearly limits behavior to evaluating a target skill.

✓ Install Mechanism

The package contains only Markdown instructions and one Markdown reference protocol, with no executable scripts, install hooks, declared dependencies, or static-scan findings.

ℹ Credentials

Reading all text files in the target skill and forwarding their contents to sub-agents is proportionate for a multi-agent evaluator, but users should only point it at skill directories they intend to review.

✓ Persistence & Privilege

No credential use, privilege escalation, file mutation, deletion, network exfiltration, or background persistence is shown; the sub-agent spawning is finite and disclosed as part of the evaluation workflow.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install multi-agent-skill-evaluator
安装完成后，直接呼叫该 Skill 的名称或使用 /multi-agent-skill-evaluator 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.4.0

v1.4.0 - description精简为'帮我评估一下这个 skill'

v1.3.0

v1.3.0 - description重写：以用户触发场景开头，支持人类评估和agent下载前自检两种用途

v1.2.0

v1.2.0 - 多智能体独立评估：3个子agent分别打分+JSON结构化输出+分歧追问机制

元数据

Slug multi-agent-skill-evaluator

版本 1.4.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 3

常见问题