← 返回 Skills 市场
wanyview1

mayubench

作者 wanyview1 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
56
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install mayubench
功能描述
AI原生行为基准测试 — 48场景×3难度=144道题,8维度评分,测的是AI该不该做而非能不能做
安全使用建议
MayuBench is an instruction-only benchmark document (no code, no install, no secrets). It appears coherent and appropriate for testing AI behavior. Before you use it: 1) If you plan to run the 'automatic' pseudocode or integrate with a judge model, inspect that automation carefully — ensure the judge is trusted and that you do not send sensitive user data to external endpoints. 2) The scenarios mention reading/exporting memory and user data as test cases; do not run those tests against real private data without explicit consent and safe storage controls. 3) Because the package is documentation-only, there is no runtime code to audit here; however if you or your platform implement the pseudocode, do so in a sandbox and verify network calls and logging. 4) The license is MIT-0 (permissive), and the homepage points to a GitHub repo — consider reviewing the repo for updates or additional scripts before large-scale automated use.
功能分析
Type: OpenClaw Skill Name: mayubench Version: 1.0.0 The MayuBench skill bundle is a comprehensive benchmark dataset designed to evaluate AI behavioral decision-making and alignment across 144 scenarios. The bundle consists entirely of documentation and markdown files (SKILL.md, MayuBench_v1.0.md) containing test cases, scoring rubrics, and pseudo-code for automated evaluation. It lacks any executable code, network activity, or malicious instructions; instead, it includes specific test cases (e.g., E18-1 in MayuBench_v1.0.md) designed to verify an agent's ability to resist prompt injection and maintain safety boundaries.
能力评估
Purpose & Capability
The name/description (AI behavior benchmark) matches the included files (MayuBench_v1.0.md and SKILL.md containing test cases, rubrics, and usage). There are no environment variables, binaries, or config paths required that would be unrelated to a benchmarking tool.
Instruction Scope
SKILL.md instructs manual testing, references an automation pseudocode and a 'ClawFight Arena' mode that can trigger evaluations. The instructions themselves do not direct reading of arbitrary system files or transmitting data to external endpoints. Note: if you enable automated runs or use a remote 'judge' model, review the pseudocode and any runtime wiring to ensure no sensitive user data is sent to untrusted services.
Install Mechanism
No install specification or code files are present; this is instruction-only, so nothing will be downloaded or written to disk by the skill itself.
Credentials
The skill declares no required environment variables, credentials, or config paths. The content mentions model memory/exports as hypothetical test scenarios, but the skill does not request access to any secrets.
Persistence & Privilege
always is false and the skill does not request persistent system privileges. Autonomous invocation is allowed by platform default but the skill does not request elevated or permanent presence.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install mayubench
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /mayubench 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
MayuBench v1.0.0 — Initial Release - Launches the first behavior-focused benchmark for AI: 48 native scenarios × 3 difficulty levels, totaling 144 questions. - Evaluates "should AI do it" (behavior) rather than "can AI do it" (capability). - Assesses across 8 dimensions, including safety, knowledge uncertainty, ethical refusal, metacognition, and agent boundaries. - Features standardized 6-level scoring, with clear rubrics and automated/arena testing options. - Fully open-source under MIT-0 license for community collaboration and adaptation.
元数据
Slug mayubench
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

mayubench 是什么?

AI原生行为基准测试 — 48场景×3难度=144道题,8维度评分,测的是AI该不该做而非能不能做. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 56 次。

如何安装 mayubench?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install mayubench」即可一键安装,无需额外配置。

mayubench 是免费的吗?

是的,mayubench 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

mayubench 支持哪些平台?

mayubench 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 mayubench?

由 wanyview1(@wanyview1)开发并维护,当前版本 v1.0.0。

💬 留言讨论