Multi-Skill-Eval | 集成化技能评估系统 — AI Agent Skill 插件 | 下载 101 次

Name: Multi-Skill-Eval | 集成化技能评估系统
Author: wangzairong

功能描述

集成化多方法技能评估系统。整合静态分析(skill-assessment)、Rubric质量打分(skill-evaluator)和自主基准测试(skill-eval)。用于全面评估、对比、审计或改进OpenClaw技能。覆盖文档完整性、代码质量、25项Rubric打分、多模型基准测试。触发词(中文): 评估技...

安全使用建议

This package appears coherent for its stated purpose: it runs local static checks, rubric grading, and an agent-driven benchmark. Before installing/running: (1) review the scripts (especially the truncated ones) for any network calls, subprocess calls, or file-write operations you don't expect; (2) run in an isolated environment (container or VM) if you plan to evaluate untrusted skills; (3) when using the benchmark mode, be aware it will cause the agent to read the full target skill directory and spawn subagents — do not point it at directories containing credentials or sensitive secrets. If you want a higher-confidence assessment, provide the remaining script contents (static-analyze.py and the truncated parts) so I can check for external network calls, subprocess.exec/ shell injection, or hardcoded secrets.

功能分析

Type: OpenClaw Skill Name: multi-skill-eval Version: 1.0.2 The bundle is a comprehensive toolkit designed for auditing, benchmarking, and improving OpenClaw skills. It contains several Python scripts (e.g., static-analyze.py, eval-skill.py, grade-assertions.py) that perform legitimate static analysis, such as checking for hardcoded secrets, verifying Python syntax, and ensuring documentation completeness. The logic is transparent, uses standard libraries, and aligns with the stated purpose of a skill evaluation system. While the documentation mentions a 'Self-Evolution' feature for rewriting skill instructions, it is explicitly marked as 'not yet implemented,' and the existing code contains no evidence of data exfiltration, unauthorized remote execution, or malicious intent.

能力标签

requires-sensitive-credentials

能力评估

✓ Purpose & Capability

Name/description (skill evaluation, static analysis, rubric scoring, benchmark) match the included files and CLI instructions: scripts perform static analysis, generate cards/leaderboards, and run benchmarks. There are no unrelated required env vars or declared binaries that would be incoherent with the stated purpose.

ℹ Instruction Scope

SKILL.md instructs running the local Python CLI scripts against target skill directories and — for the benchmark method — to have the AI agent read SKILL.md, source files, and spawn subagents to execute tests. This is coherent for an evaluator but means the skill (and you, when running it) will read the full contents of whatever skill path you point it at; review inputs before evaluating sensitive code. I saw no instructions that attempt to exfiltrate data to unknown endpoints, but parts of the scripts were truncated so full review would be prudent.

✓ Install Mechanism

No install spec (instruction-only with accompanying scripts) — lowest-risk delivery model. The package contains Python scripts; there is no evidence in the provided fragments of downloads from remote URLs or unusual install behavior. You should ensure you run scripts with a controlled Python environment and review any third-party requirements not listed here.

✓ Credentials

The skill declares no required environment variables, no primary credential, and no config paths. The rubric and scripts reference handling of credentials and dependency-gating conceptually, but there are no hardcoded secrets visible in the provided files. If you plan to run benchmarks that require external APIs or models, those credentials would be supplied by your agent environment — not by this skill.

✓ Persistence & Privilege

Flags show always:false and default autonomous invocation allowed (normal). The skill does not request permanent presence or attempt to modify other skills' configs in the visible files. The benchmark method deliberately relies on agent orchestration (spawning subagents), which increases the operational blast radius if misused — this is expected for a benchmarking/evaluation tool but worth noting.

版本历史

v1.0.2

- Added CLAWHUB_PUBLISH.md file to the project. - No changes to core functionality or evaluation methods. - Documentation and usage instructions remain unchanged.

v1.0.1

v1.0.1: 添加中文触发词和中文使用场景；修复grade-assertions.py与eval-skill.py相同问题；补全缺失的generate_skill_card.py和generate_leaderboard.py；移除static-analyze.py中导致自我矛盾的dangerous函数检测；标注benchmark需要AI agent执行，self-evolution为计划中功能

v1.0.0

整合 skill-evaluator（25项评分）、skill-assessment（静态分析）、skill-eval（自主基准评估+自进化引擎），支持快速扫描、全面评审、benchmark 对比三种模式，含自我改进能力

元数据

Slug multi-skill-eval

版本 1.0.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 3

常见问题

Multi-Skill-Eval | 集成化技能评估系统是什么？

集成化多方法技能评估系统。整合静态分析(skill-assessment)、Rubric质量打分(skill-evaluator)和自主基准测试(skill-eval)。用于全面评估、对比、审计或改进OpenClaw技能。覆盖文档完整性、代码质量、25项Rubric打分、多模型基准测试。触发词(中文): 评估技... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 101 次。

如何安装 Multi-Skill-Eval | 集成化技能评估系统？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install multi-skill-eval」即可一键安装，无需额外配置。

Multi-Skill-Eval | 集成化技能评估系统是免费的吗？

是的，Multi-Skill-Eval | 集成化技能评估系统完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Multi-Skill-Eval | 集成化技能评估系统支持哪些平台？

Multi-Skill-Eval | 集成化技能评估系统跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Multi-Skill-Eval | 集成化技能评估系统？

由 wangzairong（@wangzairong）开发并维护，当前版本 v1.0.2。

Multi-Skill-Eval | 集成化技能评估系统

Multi-Skill-Eval | 集成化技能评估系统 是什么？

如何安装 Multi-Skill-Eval | 集成化技能评估系统？

Multi-Skill-Eval | 集成化技能评估系统 是免费的吗？

Multi-Skill-Eval | 集成化技能评估系统 支持哪些平台？

谁开发了 Multi-Skill-Eval | 集成化技能评估系统？

💬 留言讨论

Multi-Skill-Eval | 集成化技能评估系统是什么？

Multi-Skill-Eval | 集成化技能评估系统是免费的吗？

Multi-Skill-Eval | 集成化技能评估系统支持哪些平台？