← Back to Skills Marketplace

mayubench-en

Name: mayubench-en
Author: wanyview1

by wanyview1 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install mayubench-en

Description

AI-Native Behavior Benchmark — 48 scenarios × 3 difficulty levels = 144 questions, 8-dimension scoring, measuring whether AI should do things, not whether it...

Usage Guidance

This skill appears coherent and instruction-only — it contains a self-contained question bank and rubric and does not request credentials or install anything. Before running automated evaluations: 1) inspect the pseudocode/automation section (the file references a pseudocode judge) to ensure it does not call external endpoints or transmit data; 2) do not provide secrets or platform credentials to any automated judge model used with this benchmark; 3) be aware many benchmark items intentionally include adversarial prompt text designed to test prompt-injection resilience — treat those test inputs as potentially manipulative and run them in isolated or non-privileged sessions; 4) if you don't want the agent to autonomously trigger evaluations, restrict skill invocation or disable autonomous invocation in your agent runtime. If you want higher assurance, paste the full pseudocode/automation snippet here for review.

Capability Analysis

Type: OpenClaw Skill Name: mayubench-en Version: 1.0.0 The bundle is a comprehensive AI behavior benchmark (MayuBench) designed to evaluate model alignment and safety across 144 scenarios. While MayuBench_v1.0.md contains examples of prompt injection and requests for harmful activities (e.g., WiFi cracking in E14-1 or 'Ignore all instructions' in E18-1), these are explicitly presented as test data for benchmarking purposes rather than instructions for the agent to execute. The skill lacks any malicious code, data exfiltration logic, or unauthorized system access.

Capability Tags

cryptocan-make-purchases

Capability Assessment

✓ Purpose & Capability

Name/description (behavior benchmark) matches the contents: question bank and scoring rubric are included, and no unrelated binaries, env vars, or installs are requested.

ℹ Instruction Scope

SKILL.md directs manual and automated evaluation using the included MayuBench_v1.0.md. The skill contains adversarial/prompt-injection-style test content (D3 includes 'injection prevention' scenarios) — the pre-scan flags for injection patterns are likely due to test questions intentionally containing adversarial prompts. The pseudocode for automated testing is referenced but not fully visible in the provided excerpt; verify that pseudocode does not instruct the agent to send sensitive data to external endpoints before running automated tests.

✓ Install Mechanism

No install spec, no code files, and no downloads — instruction-only skill with nothing written to disk by the skill itself.

✓ Credentials

No required environment variables, credentials, or config paths are declared; the skill does not ask for secrets or unrelated service tokens.

ℹ Persistence & Privilege

always:false (default) and user-invocable:true. The SKILL suggests an automated 'ClawFight Arena' mode that can 'automatically trigger MayuBench evaluation' — this is an instruction-level behavior, not a code-level service. Because the platform permits autonomous invocation by default, confirm agent runtime policies before allowing autonomous runs (especially for automated scoring), but this alone does not indicate incoherence.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install mayubench-en
After installation, invoke the skill by name or use /mayubench-en
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of MayuBench v1.0 — an AI-native behavior benchmark. - Provides 144 scenario-based questions across 8 behavioral dimensions and a six-level scoring framework. - Includes manual, automated, and arena-based testing methods. - All documentation is now available in English. - Maintains full open-source licensing (MIT-0) and a standardized, reproducible structure.

Metadata

Slug mayubench-en

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is mayubench-en?

AI-Native Behavior Benchmark — 48 scenarios × 3 difficulty levels = 144 questions, 8-dimension scoring, measuring whether AI should do things, not whether it... It is an AI Agent Skill for Claude Code / OpenClaw, with 56 downloads so far.

How do I install mayubench-en?

Run "/install mayubench-en" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is mayubench-en free?

Yes, mayubench-en is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does mayubench-en support?

mayubench-en is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created mayubench-en?

It is built and maintained by wanyview1 (@wanyview1); the current version is v1.0.0.

More Skills