← 返回 Skills 市场

ClawBrain Benchmark

Name: ClawBrain Benchmark
Author: michaelfeng

作者 michaelfeng · GitHub ↗ · v1.0.2 · MIT-0

cross-platform ⚠ suspicious

159

总下载

当前安装

版本数

在 OpenClaw 中安装

/install clawbrain-pro-benchmark

功能描述

测试你的 OpenClaw 在 205 个真实场景下的表现，对比 ClawBrain v1.0 编排引擎的提升效果

使用说明 (SKILL.md)

ClawBrain Benchmark

测试你的 AI 在 OpenClaw 中的真实表现。看看它做简单事行不行，做复杂事会不会掉链子。

使用方法

直接说"跑一下 benchmark"或"测试一下模型效果"。

测试什么

10 大类、205 个真实场景：

类别	测什么	为什么重要
文件操作	读、写、编辑文件	基本功
搜索	查资料、抓网页	日常需求
消息	微信、钉钉发消息	沟通协作
终端	跑命令、管服务	开发运维
多步任务	搜索→整理→保存→通知	真正做事的能力
错误恢复	出错了怎么办	靠不靠谱
模糊指令	"帮我准备下"	聪不聪明
视觉理解	看图、截图识别	多模态能力

评测结果（v1.0）

模型	综合	文件	搜索	终端	错误恢复	模糊指令	多步
ClawBrain Auto	90%	100%	100%	100%	100%	100%	80%
ClawBrain Pro	86%	100%	100%	100%	100%	100%	80%
单模型 A	83%	95%	100%	90%	80%	65%	73%
单模型 B	81%	85%	100%	90%	76%	55%	73%
单模型 C	73%	100%	100%	90%	56%	65%	80%

ClawBrain 通过编排引擎实现：主动思考→多模型协作→输出验证→错误恢复，综合表现超越任何单模型。

完整报告：https://clawbrain.dev/blog/openclaw-model-comparison

安全使用建议

This skill is ambiguous about what it will actually run. Before installing or invoking it: 1) Ask the developer for the exact commands/scripts the skill will execute and any network endpoints it will contact. 2) If the skill must run benchmarks that interact with your system (files, shell, messaging), require a safe, sandboxed mode and explicit allowed paths. 3) If you allow it to run, disable autonomous invocation or run it in a restricted/test agent first. 4) Avoid providing credentials or sensitive files until you understand the exact behavior. 5) If you can't get a concrete command list or a vetted install script, treat this as risky and prefer not to install in production.

功能分析

Type: OpenClaw Skill Name: clawbrain-pro-benchmark Version: 1.0.2 The skill bundle is a benchmark tool designed to evaluate AI performance across various scenarios like file operations and terminal usage. While it requests 'exec' and 'curl' permissions, the provided files (SKILL.md and _meta.json) contain only descriptive documentation and metadata without any executable code, malicious instructions, or evidence of data exfiltration. The skill appears to be a legitimate informational or self-testing module for the ClawBrain/OpenClaw ecosystem.

能力评估

⚠ Purpose & Capability

The skill claims to run extensive benchmarks across file, terminal, messaging, and multi-step scenarios. The metadata lists curl and sets command-dispatch to exec, but SKILL.md contains no concrete commands, no scripts, and no explanation of what will actually be executed. Requiring a shell exec capability and curl is disproportionate unless the skill documents what network calls or shell commands it will run.

⚠ Instruction Scope

The SKILL.md is high-level and open-ended: it tells the agent to 'run the benchmark' but provides no step-by-step commands, no allowed file paths, and no constraints. The benchmark categories (file ops, terminal commands, messaging) imply actions that could read/write files, run arbitrary shell commands, or send messages — yet the skill does not limit or document those actions. That vagueness grants broad discretion to any agent invocation.

✓ Install Mechanism

No install spec and no code files—this is instruction-only, so nothing is written to disk by an installer. That keeps install risk low.

✓ Credentials

No environment variables, credentials, or config paths are requested. The only declared dependency is curl, which could be reasonable for fetching reports — but because commands are unspecified, it's unclear why curl is required.

ℹ Persistence & Privilege

always is false and there are no special persistence requests. However the skill is configured for exec-style command dispatch and model-invocation is allowed (platform default). Combined with the skill's vagueness, autonomous or poorly constrained invocations could perform wide-ranging actions if the agent decides to run shell commands.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install clawbrain-pro-benchmark
安装完成后，直接呼叫该 Skill 的名称或使用 /clawbrain-pro-benchmark 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.2

Fix display name

v1.0.1

v1.0.1: 更新评测数据 (Auto 90% / Pro 86%)，移除后端模型名

v0.9.4

统一品牌名

v0.9.3

更新 summary 描述

v0.9.2

v0.9.2: 更新评测数据

v1.0.0

- Initial release of clawbrain-pro-benchmark. - Benchmark your OpenClaw performance across 205 real-world scenarios. - Directly compare results with ClawBrain Pro orchestration engine. - Simple commands to start: "跑一下 benchmark" or "测试一下模型效果". - Detailed test across 10 categories, including file operations, search, messaging, terminal usage, multi-step tasks, error recovery, and handling of vague instructions. - View comprehensive evaluation and model comparison in the results.

元数据

Slug clawbrain-pro-benchmark

版本 1.0.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 6

常见问题