← 返回 Skills 市场
michaelfeng

ClawBrain Benchmark

作者 michaelfeng · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ⚠ suspicious
159
总下载
0
收藏
0
当前安装
6
版本数
在 OpenClaw 中安装
/install clawbrain-pro-benchmark
功能描述
测试你的 OpenClaw 在 205 个真实场景下的表现,对比 ClawBrain v1.0 编排引擎的提升效果
使用说明 (SKILL.md)

ClawBrain Benchmark

测试你的 AI 在 OpenClaw 中的真实表现。看看它做简单事行不行,做复杂事会不会掉链子。

使用方法

直接说"跑一下 benchmark"或"测试一下模型效果"。

测试什么

10 大类、205 个真实场景:

类别 测什么 为什么重要
文件操作 读、写、编辑文件 基本功
搜索 查资料、抓网页 日常需求
消息 微信、钉钉发消息 沟通协作
终端 跑命令、管服务 开发运维
多步任务 搜索→整理→保存→通知 真正做事的能力
错误恢复 出错了怎么办 靠不靠谱
模糊指令 "帮我准备下" 聪不聪明
视觉理解 看图、截图识别 多模态能力

评测结果(v1.0)

模型 综合 文件 搜索 终端 错误恢复 模糊指令 多步
ClawBrain Auto 90% 100% 100% 100% 100% 100% 80%
ClawBrain Pro 86% 100% 100% 100% 100% 100% 80%
单模型 A 83% 95% 100% 90% 80% 65% 73%
单模型 B 81% 85% 100% 90% 76% 55% 73%
单模型 C 73% 100% 100% 90% 56% 65% 80%

ClawBrain 通过编排引擎实现:主动思考→多模型协作→输出验证→错误恢复,综合表现超越任何单模型。

完整报告:https://clawbrain.dev/blog/openclaw-model-comparison

安全使用建议
This skill is ambiguous about what it will actually run. Before installing or invoking it: 1) Ask the developer for the exact commands/scripts the skill will execute and any network endpoints it will contact. 2) If the skill must run benchmarks that interact with your system (files, shell, messaging), require a safe, sandboxed mode and explicit allowed paths. 3) If you allow it to run, disable autonomous invocation or run it in a restricted/test agent first. 4) Avoid providing credentials or sensitive files until you understand the exact behavior. 5) If you can't get a concrete command list or a vetted install script, treat this as risky and prefer not to install in production.
功能分析
Type: OpenClaw Skill Name: clawbrain-pro-benchmark Version: 1.0.2 The skill bundle is a benchmark tool designed to evaluate AI performance across various scenarios like file operations and terminal usage. While it requests 'exec' and 'curl' permissions, the provided files (SKILL.md and _meta.json) contain only descriptive documentation and metadata without any executable code, malicious instructions, or evidence of data exfiltration. The skill appears to be a legitimate informational or self-testing module for the ClawBrain/OpenClaw ecosystem.
能力评估
Purpose & Capability
The skill claims to run extensive benchmarks across file, terminal, messaging, and multi-step scenarios. The metadata lists curl and sets command-dispatch to exec, but SKILL.md contains no concrete commands, no scripts, and no explanation of what will actually be executed. Requiring a shell exec capability and curl is disproportionate unless the skill documents what network calls or shell commands it will run.
Instruction Scope
The SKILL.md is high-level and open-ended: it tells the agent to 'run the benchmark' but provides no step-by-step commands, no allowed file paths, and no constraints. The benchmark categories (file ops, terminal commands, messaging) imply actions that could read/write files, run arbitrary shell commands, or send messages — yet the skill does not limit or document those actions. That vagueness grants broad discretion to any agent invocation.
Install Mechanism
No install spec and no code files—this is instruction-only, so nothing is written to disk by an installer. That keeps install risk low.
Credentials
No environment variables, credentials, or config paths are requested. The only declared dependency is curl, which could be reasonable for fetching reports — but because commands are unspecified, it's unclear why curl is required.
Persistence & Privilege
always is false and there are no special persistence requests. However the skill is configured for exec-style command dispatch and model-invocation is allowed (platform default). Combined with the skill's vagueness, autonomous or poorly constrained invocations could perform wide-ranging actions if the agent decides to run shell commands.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install clawbrain-pro-benchmark
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /clawbrain-pro-benchmark 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
Fix display name
v1.0.1
v1.0.1: 更新评测数据 (Auto 90% / Pro 86%),移除后端模型名
v0.9.4
统一品牌名
v0.9.3
更新 summary 描述
v0.9.2
v0.9.2: 更新评测数据
v1.0.0
- Initial release of clawbrain-pro-benchmark. - Benchmark your OpenClaw performance across 205 real-world scenarios. - Directly compare results with ClawBrain Pro orchestration engine. - Simple commands to start: "跑一下 benchmark" or "测试一下模型效果". - Detailed test across 10 categories, including file operations, search, messaging, terminal usage, multi-step tasks, error recovery, and handling of vague instructions. - View comprehensive evaluation and model comparison in the results.
元数据
Slug clawbrain-pro-benchmark
版本 1.0.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 6
常见问题

ClawBrain Benchmark 是什么?

测试你的 OpenClaw 在 205 个真实场景下的表现,对比 ClawBrain v1.0 编排引擎的提升效果. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 159 次。

如何安装 ClawBrain Benchmark?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install clawbrain-pro-benchmark」即可一键安装,无需额外配置。

ClawBrain Benchmark 是免费的吗?

是的,ClawBrain Benchmark 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ClawBrain Benchmark 支持哪些平台?

ClawBrain Benchmark 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ClawBrain Benchmark?

由 michaelfeng(@michaelfeng)开发并维护,当前版本 v1.0.2。

💬 留言讨论