← 返回 Skills 市场
wangzairong

Multi-Skill-Eval | 集成化技能评估系统

作者 wangzairong · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ✓ 安全检测通过
101
总下载
1
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install multi-skill-eval
功能描述
集成化多方法技能评估系统。整合静态分析(skill-assessment)、Rubric质量打分(skill-evaluator)和自主基准测试(skill-eval)。用于全面评估、对比、审计或改进OpenClaw技能。覆盖文档完整性、代码质量、25项Rubric打分、多模型基准测试。 触发词(中文): 评估技...
安全使用建议
This package appears coherent for its stated purpose: it runs local static checks, rubric grading, and an agent-driven benchmark. Before installing/running: (1) review the scripts (especially the truncated ones) for any network calls, subprocess calls, or file-write operations you don't expect; (2) run in an isolated environment (container or VM) if you plan to evaluate untrusted skills; (3) when using the benchmark mode, be aware it will cause the agent to read the full target skill directory and spawn subagents — do not point it at directories containing credentials or sensitive secrets. If you want a higher-confidence assessment, provide the remaining script contents (static-analyze.py and the truncated parts) so I can check for external network calls, subprocess.exec/ shell injection, or hardcoded secrets.
功能分析
Type: OpenClaw Skill Name: multi-skill-eval Version: 1.0.2 The bundle is a comprehensive toolkit designed for auditing, benchmarking, and improving OpenClaw skills. It contains several Python scripts (e.g., static-analyze.py, eval-skill.py, grade-assertions.py) that perform legitimate static analysis, such as checking for hardcoded secrets, verifying Python syntax, and ensuring documentation completeness. The logic is transparent, uses standard libraries, and aligns with the stated purpose of a skill evaluation system. While the documentation mentions a 'Self-Evolution' feature for rewriting skill instructions, it is explicitly marked as 'not yet implemented,' and the existing code contains no evidence of data exfiltration, unauthorized remote execution, or malicious intent.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Name/description (skill evaluation, static analysis, rubric scoring, benchmark) match the included files and CLI instructions: scripts perform static analysis, generate cards/leaderboards, and run benchmarks. There are no unrelated required env vars or declared binaries that would be incoherent with the stated purpose.
Instruction Scope
SKILL.md instructs running the local Python CLI scripts against target skill directories and — for the benchmark method — to have the AI agent read SKILL.md, source files, and spawn subagents to execute tests. This is coherent for an evaluator but means the skill (and you, when running it) will read the full contents of whatever skill path you point it at; review inputs before evaluating sensitive code. I saw no instructions that attempt to exfiltrate data to unknown endpoints, but parts of the scripts were truncated so full review would be prudent.
Install Mechanism
No install spec (instruction-only with accompanying scripts) — lowest-risk delivery model. The package contains Python scripts; there is no evidence in the provided fragments of downloads from remote URLs or unusual install behavior. You should ensure you run scripts with a controlled Python environment and review any third-party requirements not listed here.
Credentials
The skill declares no required environment variables, no primary credential, and no config paths. The rubric and scripts reference handling of credentials and dependency-gating conceptually, but there are no hardcoded secrets visible in the provided files. If you plan to run benchmarks that require external APIs or models, those credentials would be supplied by your agent environment — not by this skill.
Persistence & Privilege
Flags show always:false and default autonomous invocation allowed (normal). The skill does not request permanent presence or attempt to modify other skills' configs in the visible files. The benchmark method deliberately relies on agent orchestration (spawning subagents), which increases the operational blast radius if misused — this is expected for a benchmarking/evaluation tool but worth noting.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install multi-skill-eval
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /multi-skill-eval 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
- Added CLAWHUB_PUBLISH.md file to the project. - No changes to core functionality or evaluation methods. - Documentation and usage instructions remain unchanged.
v1.0.1
v1.0.1: 添加中文触发词和中文使用场景;修复grade-assertions.py与eval-skill.py相同问题;补全缺失的generate_skill_card.py和generate_leaderboard.py;移除static-analyze.py中导致自我矛盾的dangerous函数检测;标注benchmark需要AI agent执行,self-evolution为计划中功能
v1.0.0
整合 skill-evaluator(25项评分)、skill-assessment(静态分析)、skill-eval(自主基准评估+自进化引擎),支持快速扫描、全面评审、benchmark 对比三种模式,含自我改进能力
元数据
Slug multi-skill-eval
版本 1.0.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

Multi-Skill-Eval | 集成化技能评估系统 是什么?

集成化多方法技能评估系统。整合静态分析(skill-assessment)、Rubric质量打分(skill-evaluator)和自主基准测试(skill-eval)。用于全面评估、对比、审计或改进OpenClaw技能。覆盖文档完整性、代码质量、25项Rubric打分、多模型基准测试。 触发词(中文): 评估技... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 101 次。

如何安装 Multi-Skill-Eval | 集成化技能评估系统?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install multi-skill-eval」即可一键安装,无需额外配置。

Multi-Skill-Eval | 集成化技能评估系统 是免费的吗?

是的,Multi-Skill-Eval | 集成化技能评估系统 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Multi-Skill-Eval | 集成化技能评估系统 支持哪些平台?

Multi-Skill-Eval | 集成化技能评估系统 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Multi-Skill-Eval | 集成化技能评估系统?

由 wangzairong(@wangzairong)开发并维护,当前版本 v1.0.2。

💬 留言讨论