Multi-Skill-Eval | 集成化技能评估系统 — AI Agent Skill | 101 Downloads

Name: Multi-Skill-Eval | 集成化技能评估系统
Author: wangzairong

Description

集成化多方法技能评估系统。整合静态分析(skill-assessment)、Rubric质量打分(skill-evaluator)和自主基准测试(skill-eval)。用于全面评估、对比、审计或改进OpenClaw技能。覆盖文档完整性、代码质量、25项Rubric打分、多模型基准测试。触发词(中文): 评估技...

Usage Guidance

This package appears coherent for its stated purpose: it runs local static checks, rubric grading, and an agent-driven benchmark. Before installing/running: (1) review the scripts (especially the truncated ones) for any network calls, subprocess calls, or file-write operations you don't expect; (2) run in an isolated environment (container or VM) if you plan to evaluate untrusted skills; (3) when using the benchmark mode, be aware it will cause the agent to read the full target skill directory and spawn subagents — do not point it at directories containing credentials or sensitive secrets. If you want a higher-confidence assessment, provide the remaining script contents (static-analyze.py and the truncated parts) so I can check for external network calls, subprocess.exec/ shell injection, or hardcoded secrets.

Capability Analysis

Type: OpenClaw Skill Name: multi-skill-eval Version: 1.0.2 The bundle is a comprehensive toolkit designed for auditing, benchmarking, and improving OpenClaw skills. It contains several Python scripts (e.g., static-analyze.py, eval-skill.py, grade-assertions.py) that perform legitimate static analysis, such as checking for hardcoded secrets, verifying Python syntax, and ensuring documentation completeness. The logic is transparent, uses standard libraries, and aligns with the stated purpose of a skill evaluation system. While the documentation mentions a 'Self-Evolution' feature for rewriting skill instructions, it is explicitly marked as 'not yet implemented,' and the existing code contains no evidence of data exfiltration, unauthorized remote execution, or malicious intent.

Capability Tags

requires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description (skill evaluation, static analysis, rubric scoring, benchmark) match the included files and CLI instructions: scripts perform static analysis, generate cards/leaderboards, and run benchmarks. There are no unrelated required env vars or declared binaries that would be incoherent with the stated purpose.

ℹ Instruction Scope

SKILL.md instructs running the local Python CLI scripts against target skill directories and — for the benchmark method — to have the AI agent read SKILL.md, source files, and spawn subagents to execute tests. This is coherent for an evaluator but means the skill (and you, when running it) will read the full contents of whatever skill path you point it at; review inputs before evaluating sensitive code. I saw no instructions that attempt to exfiltrate data to unknown endpoints, but parts of the scripts were truncated so full review would be prudent.

✓ Install Mechanism

No install spec (instruction-only with accompanying scripts) — lowest-risk delivery model. The package contains Python scripts; there is no evidence in the provided fragments of downloads from remote URLs or unusual install behavior. You should ensure you run scripts with a controlled Python environment and review any third-party requirements not listed here.

✓ Credentials

The skill declares no required environment variables, no primary credential, and no config paths. The rubric and scripts reference handling of credentials and dependency-gating conceptually, but there are no hardcoded secrets visible in the provided files. If you plan to run benchmarks that require external APIs or models, those credentials would be supplied by your agent environment — not by this skill.

✓ Persistence & Privilege

Flags show always:false and default autonomous invocation allowed (normal). The skill does not request permanent presence or attempt to modify other skills' configs in the visible files. The benchmark method deliberately relies on agent orchestration (spawning subagents), which increases the operational blast radius if misused — this is expected for a benchmarking/evaluation tool but worth noting.

Version History

v1.0.2

- Added CLAWHUB_PUBLISH.md file to the project. - No changes to core functionality or evaluation methods. - Documentation and usage instructions remain unchanged.

v1.0.1

v1.0.1: 添加中文触发词和中文使用场景；修复grade-assertions.py与eval-skill.py相同问题；补全缺失的generate_skill_card.py和generate_leaderboard.py；移除static-analyze.py中导致自我矛盾的dangerous函数检测；标注benchmark需要AI agent执行，self-evolution为计划中功能

v1.0.0

整合 skill-evaluator（25项评分）、skill-assessment（静态分析）、skill-eval（自主基准评估+自进化引擎），支持快速扫描、全面评审、benchmark 对比三种模式，含自我改进能力

Metadata

Slug multi-skill-eval

Version 1.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Multi-Skill-Eval | 集成化技能评估系统?

集成化多方法技能评估系统。整合静态分析(skill-assessment)、Rubric质量打分(skill-evaluator)和自主基准测试(skill-eval)。用于全面评估、对比、审计或改进OpenClaw技能。覆盖文档完整性、代码质量、25项Rubric打分、多模型基准测试。触发词(中文): 评估技... It is an AI Agent Skill for Claude Code / OpenClaw, with 101 downloads so far.

How do I install Multi-Skill-Eval | 集成化技能评估系统?

Run "/install multi-skill-eval" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Multi-Skill-Eval | 集成化技能评估系统 free?

Yes, Multi-Skill-Eval | 集成化技能评估系统 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Multi-Skill-Eval | 集成化技能评估系统 support?

Multi-Skill-Eval | 集成化技能评估系统 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Multi-Skill-Eval | 集成化技能评估系统?

It is built and maintained by wangzairong (@wangzairong); the current version is v1.0.2.

More Skills

Multi-Skill-Eval | 集成化技能评估系统

What is Multi-Skill-Eval | 集成化技能评估系统?

How do I install Multi-Skill-Eval | 集成化技能评估系统?

Is Multi-Skill-Eval | 集成化技能评估系统 free?

Which platforms does Multi-Skill-Eval | 集成化技能评估系统 support?

Who created Multi-Skill-Eval | 集成化技能评估系统?

💬 Comments