← Back to Skills Marketplace

ClawBrain Benchmark

Name: ClawBrain Benchmark
Author: michaelfeng

by michaelfeng · GitHub ↗ · v1.0.2 · MIT-0

cross-platform ⚠ suspicious

159

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install clawbrain-pro-benchmark

Description

测试你的 OpenClaw 在 205 个真实场景下的表现，对比 ClawBrain v1.0 编排引擎的提升效果

README (SKILL.md)

ClawBrain Benchmark

测试你的 AI 在 OpenClaw 中的真实表现。看看它做简单事行不行，做复杂事会不会掉链子。

使用方法

直接说"跑一下 benchmark"或"测试一下模型效果"。

测试什么

10 大类、205 个真实场景：

类别	测什么	为什么重要
文件操作	读、写、编辑文件	基本功
搜索	查资料、抓网页	日常需求
消息	微信、钉钉发消息	沟通协作
终端	跑命令、管服务	开发运维
多步任务	搜索→整理→保存→通知	真正做事的能力
错误恢复	出错了怎么办	靠不靠谱
模糊指令	"帮我准备下"	聪不聪明
视觉理解	看图、截图识别	多模态能力

评测结果（v1.0）

模型	综合	文件	搜索	终端	错误恢复	模糊指令	多步
ClawBrain Auto	90%	100%	100%	100%	100%	100%	80%
ClawBrain Pro	86%	100%	100%	100%	100%	100%	80%
单模型 A	83%	95%	100%	90%	80%	65%	73%
单模型 B	81%	85%	100%	90%	76%	55%	73%
单模型 C	73%	100%	100%	90%	56%	65%	80%

ClawBrain 通过编排引擎实现：主动思考→多模型协作→输出验证→错误恢复，综合表现超越任何单模型。

完整报告：https://clawbrain.dev/blog/openclaw-model-comparison

Usage Guidance

This skill is ambiguous about what it will actually run. Before installing or invoking it: 1) Ask the developer for the exact commands/scripts the skill will execute and any network endpoints it will contact. 2) If the skill must run benchmarks that interact with your system (files, shell, messaging), require a safe, sandboxed mode and explicit allowed paths. 3) If you allow it to run, disable autonomous invocation or run it in a restricted/test agent first. 4) Avoid providing credentials or sensitive files until you understand the exact behavior. 5) If you can't get a concrete command list or a vetted install script, treat this as risky and prefer not to install in production.

Capability Analysis

Type: OpenClaw Skill Name: clawbrain-pro-benchmark Version: 1.0.2 The skill bundle is a benchmark tool designed to evaluate AI performance across various scenarios like file operations and terminal usage. While it requests 'exec' and 'curl' permissions, the provided files (SKILL.md and _meta.json) contain only descriptive documentation and metadata without any executable code, malicious instructions, or evidence of data exfiltration. The skill appears to be a legitimate informational or self-testing module for the ClawBrain/OpenClaw ecosystem.

Capability Assessment

⚠ Purpose & Capability

The skill claims to run extensive benchmarks across file, terminal, messaging, and multi-step scenarios. The metadata lists curl and sets command-dispatch to exec, but SKILL.md contains no concrete commands, no scripts, and no explanation of what will actually be executed. Requiring a shell exec capability and curl is disproportionate unless the skill documents what network calls or shell commands it will run.

⚠ Instruction Scope

The SKILL.md is high-level and open-ended: it tells the agent to 'run the benchmark' but provides no step-by-step commands, no allowed file paths, and no constraints. The benchmark categories (file ops, terminal commands, messaging) imply actions that could read/write files, run arbitrary shell commands, or send messages — yet the skill does not limit or document those actions. That vagueness grants broad discretion to any agent invocation.

✓ Install Mechanism

No install spec and no code files—this is instruction-only, so nothing is written to disk by an installer. That keeps install risk low.

✓ Credentials

No environment variables, credentials, or config paths are requested. The only declared dependency is curl, which could be reasonable for fetching reports — but because commands are unspecified, it's unclear why curl is required.

ℹ Persistence & Privilege

always is false and there are no special persistence requests. However the skill is configured for exec-style command dispatch and model-invocation is allowed (platform default). Combined with the skill's vagueness, autonomous or poorly constrained invocations could perform wide-ranging actions if the agent decides to run shell commands.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install clawbrain-pro-benchmark
After installation, invoke the skill by name or use /clawbrain-pro-benchmark
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.2

Fix display name

v1.0.1

v1.0.1: 更新评测数据 (Auto 90% / Pro 86%)，移除后端模型名

v0.9.4

统一品牌名

v0.9.3

更新 summary 描述

v0.9.2

v0.9.2: 更新评测数据

v1.0.0

- Initial release of clawbrain-pro-benchmark. - Benchmark your OpenClaw performance across 205 real-world scenarios. - Directly compare results with ClawBrain Pro orchestration engine. - Simple commands to start: "跑一下 benchmark" or "测试一下模型效果". - Detailed test across 10 categories, including file operations, search, messaging, terminal usage, multi-step tasks, error recovery, and handling of vague instructions. - View comprehensive evaluation and model comparison in the results.

Metadata

Slug clawbrain-pro-benchmark

Version 1.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 6

Frequently Asked Questions

What is ClawBrain Benchmark?

测试你的 OpenClaw 在 205 个真实场景下的表现，对比 ClawBrain v1.0 编排引擎的提升效果. It is an AI Agent Skill for Claude Code / OpenClaw, with 159 downloads so far.

How do I install ClawBrain Benchmark?

Run "/install clawbrain-pro-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ClawBrain Benchmark free?

Yes, ClawBrain Benchmark is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ClawBrain Benchmark support?

ClawBrain Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ClawBrain Benchmark?

It is built and maintained by michaelfeng (@michaelfeng); the current version is v1.0.2.

More Skills

ClawBrain Benchmark

ClawBrain Benchmark

使用方法

测试什么

评测结果（v1.0）

What is ClawBrain Benchmark?

How do I install ClawBrain Benchmark?

Is ClawBrain Benchmark free?

Which platforms does ClawBrain Benchmark support?

Who created ClawBrain Benchmark?

💬 Comments