← Back to Skills Marketplace

mayubench

Name: mayubench
Author: wanyview1

by wanyview1 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install mayubench

Description

AI原生行为基准测试 — 48场景×3难度=144道题，8维度评分，测的是AI该不该做而非能不能做

Usage Guidance

MayuBench is an instruction-only benchmark document (no code, no install, no secrets). It appears coherent and appropriate for testing AI behavior. Before you use it: 1) If you plan to run the 'automatic' pseudocode or integrate with a judge model, inspect that automation carefully — ensure the judge is trusted and that you do not send sensitive user data to external endpoints. 2) The scenarios mention reading/exporting memory and user data as test cases; do not run those tests against real private data without explicit consent and safe storage controls. 3) Because the package is documentation-only, there is no runtime code to audit here; however if you or your platform implement the pseudocode, do so in a sandbox and verify network calls and logging. 4) The license is MIT-0 (permissive), and the homepage points to a GitHub repo — consider reviewing the repo for updates or additional scripts before large-scale automated use.

Capability Analysis

Type: OpenClaw Skill Name: mayubench Version: 1.0.0 The MayuBench skill bundle is a comprehensive benchmark dataset designed to evaluate AI behavioral decision-making and alignment across 144 scenarios. The bundle consists entirely of documentation and markdown files (SKILL.md, MayuBench_v1.0.md) containing test cases, scoring rubrics, and pseudo-code for automated evaluation. It lacks any executable code, network activity, or malicious instructions; instead, it includes specific test cases (e.g., E18-1 in MayuBench_v1.0.md) designed to verify an agent's ability to resist prompt injection and maintain safety boundaries.

Capability Assessment

✓ Purpose & Capability

The name/description (AI behavior benchmark) matches the included files (MayuBench_v1.0.md and SKILL.md containing test cases, rubrics, and usage). There are no environment variables, binaries, or config paths required that would be unrelated to a benchmarking tool.

ℹ Instruction Scope

SKILL.md instructs manual testing, references an automation pseudocode and a 'ClawFight Arena' mode that can trigger evaluations. The instructions themselves do not direct reading of arbitrary system files or transmitting data to external endpoints. Note: if you enable automated runs or use a remote 'judge' model, review the pseudocode and any runtime wiring to ensure no sensitive user data is sent to untrusted services.

✓ Install Mechanism

No install specification or code files are present; this is instruction-only, so nothing will be downloaded or written to disk by the skill itself.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths. The content mentions model memory/exports as hypothetical test scenarios, but the skill does not request access to any secrets.

✓ Persistence & Privilege

always is false and the skill does not request persistent system privileges. Autonomous invocation is allowed by platform default but the skill does not request elevated or permanent presence.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install mayubench
After installation, invoke the skill by name or use /mayubench
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

MayuBench v1.0.0 — Initial Release - Launches the first behavior-focused benchmark for AI: 48 native scenarios × 3 difficulty levels, totaling 144 questions. - Evaluates "should AI do it" (behavior) rather than "can AI do it" (capability). - Assesses across 8 dimensions, including safety, knowledge uncertainty, ethical refusal, metacognition, and agent boundaries. - Features standardized 6-level scoring, with clear rubrics and automated/arena testing options. - Fully open-source under MIT-0 license for community collaboration and adaptation.

Metadata

Slug mayubench

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is mayubench?

AI原生行为基准测试 — 48场景×3难度=144道题，8维度评分，测的是AI该不该做而非能不能做. It is an AI Agent Skill for Claude Code / OpenClaw, with 56 downloads so far.

How do I install mayubench?

Run "/install mayubench" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is mayubench free?

Yes, mayubench is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does mayubench support?

mayubench is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created mayubench?

It is built and maintained by wanyview1 (@wanyview1); the current version is v1.0.0.

More Skills