← 返回 Skills 市场

Model Benchmark

Name: Model Benchmark
Author: leosheep821-debug

作者 leosheep821-debug · GitHub ↗ · v0.1.0 · MIT-0

cross-platform ⚠ suspicious

131

总下载

当前安装

版本数

在 OpenClaw 中安装

/install model-benchmark

功能描述

深度测评各模型在 OpenClaw 上的实际表现，支持中文理解/代码/推理/工具调用多维度评估。

安全使用建议

This skill appears to be a legitimate benchmarking instruction set, but it refers to obtaining and using multiple external provider API keys without declaring them in the metadata or describing how to provide or store them. Before installing or using it: (1) Confirm how you'll supply provider keys — prefer ephemeral or least-privilege keys and avoid pasting long-lived secrets into third-party UIs. (2) Understand where keys will be stored (models.json) and check file permissions; back up the original config. (3) Verify the local proxy address (127.0.0.1:8766) is expected in your environment and not a misdirection to an unfamiliar service. (4) If the agent will be given keys, ensure the agent's prompts/storage behavior is acceptable (won't exfiltrate them). If you need complete assurance, request the skill author to declare required env vars and document exactly how credentials are used and persisted.

功能分析

Type: OpenClaw Skill Name: model-benchmark Version: 0.1.0 The skill bundle is a benchmarking framework designed to evaluate AI models on the OpenClaw platform across dimensions like code generation, reasoning, and tool use. The files (SKILL.md and skill.md) contain only descriptive instructions, test cases, and reporting templates without any executable code, data exfiltration logic, or malicious prompt injections. It correctly directs the user to configure API keys through the platform's standard 'models.json' configuration.

能力评估

ℹ Purpose & Capability

The name, description, and SKILL.md consistently describe a model benchmarking framework and include sensible test cases and report format. The SKILL.md also legitimately references adding providers to OpenClaw's models.json and using provider API keys for GLM-5, Qwen, etc., which is expected for a benchmarking skill that talks to external models.

ℹ Instruction Scope

The instructions stay within benchmarking scope (test items, scoring, report format). They reference specific operational items: editing OpenClaw models.json to add providers, using a local proxy at 127.0.0.1:8766, and acquiring provider API keys. They do not instruct the agent to read unrelated system files or exfiltrate data, but they do not specify safe handling or storage of credentials.

✓ Install Mechanism

No install spec and no code files are provided (instruction-only), so nothing will be written to disk or installed by the skill itself. This is the lowest-risk install model.

⚠ Credentials

The SKILL.md explicitly lists provider API Key needs (GLM-5, Qwen, etc.) but the skill metadata declares no required environment variables or primary credential. That mismatch means the skill may expect the user/agent to supply secrets via models.json or prompts at runtime; the skill gives no guidance on where keys are stored, what permissions are needed, or whether keys will be transmitted to other endpoints. Requiring multiple external API keys is proportionate to benchmarking, but the lack of declared/env guidance and storage instructions is a privacy/operational concern.

✓ Persistence & Privilege

The skill is not always-included and does not request system-level persistence. It does mention editing OpenClaw configuration (models.json) which is a normal and limited config change for integrating providers; there is no indication it modifies other skills or system-wide settings beyond provider config advice.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install model-benchmark
安装完成后，直接呼叫该 Skill 的名称或使用 /model-benchmark 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.0

- Initial release of model-benchmark skill for deep evaluation of models on OpenClaw. - Supports multidimensional assessment: Chinese understanding, coding, reasoning, and tool-use evaluation. - Includes a standardized test set and scoring rubrics for consistent benchmarking. - Documents required APIs and configuration methods for adding new model providers. - Provides a detailed report template for presenting model evaluation results.

元数据

Slug model-benchmark

版本 0.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Model Benchmark 是什么？

深度测评各模型在 OpenClaw 上的实际表现，支持中文理解/代码/推理/工具调用多维度评估。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 131 次。

如何安装 Model Benchmark？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install model-benchmark」即可一键安装，无需额外配置。

Model Benchmark 是免费的吗？

是的，Model Benchmark 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Model Benchmark 支持哪些平台？

Model Benchmark 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Model Benchmark？

由 leosheep821-debug（@leosheep821-debug）开发并维护，当前版本 v0.1.0。