← Back to Skills Marketplace
wanyview1

mayubench

by wanyview1 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
56
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install mayubench
Description
AI原生行为基准测试 — 48场景×3难度=144道题,8维度评分,测的是AI该不该做而非能不能做
Usage Guidance
MayuBench is an instruction-only benchmark document (no code, no install, no secrets). It appears coherent and appropriate for testing AI behavior. Before you use it: 1) If you plan to run the 'automatic' pseudocode or integrate with a judge model, inspect that automation carefully — ensure the judge is trusted and that you do not send sensitive user data to external endpoints. 2) The scenarios mention reading/exporting memory and user data as test cases; do not run those tests against real private data without explicit consent and safe storage controls. 3) Because the package is documentation-only, there is no runtime code to audit here; however if you or your platform implement the pseudocode, do so in a sandbox and verify network calls and logging. 4) The license is MIT-0 (permissive), and the homepage points to a GitHub repo — consider reviewing the repo for updates or additional scripts before large-scale automated use.
Capability Analysis
Type: OpenClaw Skill Name: mayubench Version: 1.0.0 The MayuBench skill bundle is a comprehensive benchmark dataset designed to evaluate AI behavioral decision-making and alignment across 144 scenarios. The bundle consists entirely of documentation and markdown files (SKILL.md, MayuBench_v1.0.md) containing test cases, scoring rubrics, and pseudo-code for automated evaluation. It lacks any executable code, network activity, or malicious instructions; instead, it includes specific test cases (e.g., E18-1 in MayuBench_v1.0.md) designed to verify an agent's ability to resist prompt injection and maintain safety boundaries.
Capability Assessment
Purpose & Capability
The name/description (AI behavior benchmark) matches the included files (MayuBench_v1.0.md and SKILL.md containing test cases, rubrics, and usage). There are no environment variables, binaries, or config paths required that would be unrelated to a benchmarking tool.
Instruction Scope
SKILL.md instructs manual testing, references an automation pseudocode and a 'ClawFight Arena' mode that can trigger evaluations. The instructions themselves do not direct reading of arbitrary system files or transmitting data to external endpoints. Note: if you enable automated runs or use a remote 'judge' model, review the pseudocode and any runtime wiring to ensure no sensitive user data is sent to untrusted services.
Install Mechanism
No install specification or code files are present; this is instruction-only, so nothing will be downloaded or written to disk by the skill itself.
Credentials
The skill declares no required environment variables, credentials, or config paths. The content mentions model memory/exports as hypothetical test scenarios, but the skill does not request access to any secrets.
Persistence & Privilege
always is false and the skill does not request persistent system privileges. Autonomous invocation is allowed by platform default but the skill does not request elevated or permanent presence.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install mayubench
  3. After installation, invoke the skill by name or use /mayubench
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
MayuBench v1.0.0 — Initial Release - Launches the first behavior-focused benchmark for AI: 48 native scenarios × 3 difficulty levels, totaling 144 questions. - Evaluates "should AI do it" (behavior) rather than "can AI do it" (capability). - Assesses across 8 dimensions, including safety, knowledge uncertainty, ethical refusal, metacognition, and agent boundaries. - Features standardized 6-level scoring, with clear rubrics and automated/arena testing options. - Fully open-source under MIT-0 license for community collaboration and adaptation.
Metadata
Slug mayubench
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is mayubench?

AI原生行为基准测试 — 48场景×3难度=144道题,8维度评分,测的是AI该不该做而非能不能做. It is an AI Agent Skill for Claude Code / OpenClaw, with 56 downloads so far.

How do I install mayubench?

Run "/install mayubench" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is mayubench free?

Yes, mayubench is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does mayubench support?

mayubench is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created mayubench?

It is built and maintained by wanyview1 (@wanyview1); the current version is v1.0.0.

💬 Comments