← Back to Skills Marketplace
michaelfeng

ClawBrain Benchmark

by michaelfeng · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ⚠ suspicious
159
Downloads
0
Stars
0
Active Installs
6
Versions
Install in OpenClaw
/install clawbrain-pro-benchmark
Description
测试你的 OpenClaw 在 205 个真实场景下的表现,对比 ClawBrain v1.0 编排引擎的提升效果
README (SKILL.md)

ClawBrain Benchmark

测试你的 AI 在 OpenClaw 中的真实表现。看看它做简单事行不行,做复杂事会不会掉链子。

使用方法

直接说"跑一下 benchmark"或"测试一下模型效果"。

测试什么

10 大类、205 个真实场景:

类别 测什么 为什么重要
文件操作 读、写、编辑文件 基本功
搜索 查资料、抓网页 日常需求
消息 微信、钉钉发消息 沟通协作
终端 跑命令、管服务 开发运维
多步任务 搜索→整理→保存→通知 真正做事的能力
错误恢复 出错了怎么办 靠不靠谱
模糊指令 "帮我准备下" 聪不聪明
视觉理解 看图、截图识别 多模态能力

评测结果(v1.0)

模型 综合 文件 搜索 终端 错误恢复 模糊指令 多步
ClawBrain Auto 90% 100% 100% 100% 100% 100% 80%
ClawBrain Pro 86% 100% 100% 100% 100% 100% 80%
单模型 A 83% 95% 100% 90% 80% 65% 73%
单模型 B 81% 85% 100% 90% 76% 55% 73%
单模型 C 73% 100% 100% 90% 56% 65% 80%

ClawBrain 通过编排引擎实现:主动思考→多模型协作→输出验证→错误恢复,综合表现超越任何单模型。

完整报告:https://clawbrain.dev/blog/openclaw-model-comparison

Usage Guidance
This skill is ambiguous about what it will actually run. Before installing or invoking it: 1) Ask the developer for the exact commands/scripts the skill will execute and any network endpoints it will contact. 2) If the skill must run benchmarks that interact with your system (files, shell, messaging), require a safe, sandboxed mode and explicit allowed paths. 3) If you allow it to run, disable autonomous invocation or run it in a restricted/test agent first. 4) Avoid providing credentials or sensitive files until you understand the exact behavior. 5) If you can't get a concrete command list or a vetted install script, treat this as risky and prefer not to install in production.
Capability Analysis
Type: OpenClaw Skill Name: clawbrain-pro-benchmark Version: 1.0.2 The skill bundle is a benchmark tool designed to evaluate AI performance across various scenarios like file operations and terminal usage. While it requests 'exec' and 'curl' permissions, the provided files (SKILL.md and _meta.json) contain only descriptive documentation and metadata without any executable code, malicious instructions, or evidence of data exfiltration. The skill appears to be a legitimate informational or self-testing module for the ClawBrain/OpenClaw ecosystem.
Capability Assessment
Purpose & Capability
The skill claims to run extensive benchmarks across file, terminal, messaging, and multi-step scenarios. The metadata lists curl and sets command-dispatch to exec, but SKILL.md contains no concrete commands, no scripts, and no explanation of what will actually be executed. Requiring a shell exec capability and curl is disproportionate unless the skill documents what network calls or shell commands it will run.
Instruction Scope
The SKILL.md is high-level and open-ended: it tells the agent to 'run the benchmark' but provides no step-by-step commands, no allowed file paths, and no constraints. The benchmark categories (file ops, terminal commands, messaging) imply actions that could read/write files, run arbitrary shell commands, or send messages — yet the skill does not limit or document those actions. That vagueness grants broad discretion to any agent invocation.
Install Mechanism
No install spec and no code files—this is instruction-only, so nothing is written to disk by an installer. That keeps install risk low.
Credentials
No environment variables, credentials, or config paths are requested. The only declared dependency is curl, which could be reasonable for fetching reports — but because commands are unspecified, it's unclear why curl is required.
Persistence & Privilege
always is false and there are no special persistence requests. However the skill is configured for exec-style command dispatch and model-invocation is allowed (platform default). Combined with the skill's vagueness, autonomous or poorly constrained invocations could perform wide-ranging actions if the agent decides to run shell commands.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install clawbrain-pro-benchmark
  3. After installation, invoke the skill by name or use /clawbrain-pro-benchmark
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
Fix display name
v1.0.1
v1.0.1: 更新评测数据 (Auto 90% / Pro 86%),移除后端模型名
v0.9.4
统一品牌名
v0.9.3
更新 summary 描述
v0.9.2
v0.9.2: 更新评测数据
v1.0.0
- Initial release of clawbrain-pro-benchmark. - Benchmark your OpenClaw performance across 205 real-world scenarios. - Directly compare results with ClawBrain Pro orchestration engine. - Simple commands to start: "跑一下 benchmark" or "测试一下模型效果". - Detailed test across 10 categories, including file operations, search, messaging, terminal usage, multi-step tasks, error recovery, and handling of vague instructions. - View comprehensive evaluation and model comparison in the results.
Metadata
Slug clawbrain-pro-benchmark
Version 1.0.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 6
Frequently Asked Questions

What is ClawBrain Benchmark?

测试你的 OpenClaw 在 205 个真实场景下的表现,对比 ClawBrain v1.0 编排引擎的提升效果. It is an AI Agent Skill for Claude Code / OpenClaw, with 159 downloads so far.

How do I install ClawBrain Benchmark?

Run "/install clawbrain-pro-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ClawBrain Benchmark free?

Yes, ClawBrain Benchmark is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ClawBrain Benchmark support?

ClawBrain Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ClawBrain Benchmark?

It is built and maintained by michaelfeng (@michaelfeng); the current version is v1.0.2.

💬 Comments