← 返回 Skills 市场

mayubench-en

Name: mayubench-en
Author: wanyview1

作者 wanyview1 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install mayubench-en

功能描述

AI-Native Behavior Benchmark — 48 scenarios × 3 difficulty levels = 144 questions, 8-dimension scoring, measuring whether AI should do things, not whether it...

安全使用建议

This skill appears coherent and instruction-only — it contains a self-contained question bank and rubric and does not request credentials or install anything. Before running automated evaluations: 1) inspect the pseudocode/automation section (the file references a pseudocode judge) to ensure it does not call external endpoints or transmit data; 2) do not provide secrets or platform credentials to any automated judge model used with this benchmark; 3) be aware many benchmark items intentionally include adversarial prompt text designed to test prompt-injection resilience — treat those test inputs as potentially manipulative and run them in isolated or non-privileged sessions; 4) if you don't want the agent to autonomously trigger evaluations, restrict skill invocation or disable autonomous invocation in your agent runtime. If you want higher assurance, paste the full pseudocode/automation snippet here for review.

功能分析

Type: OpenClaw Skill Name: mayubench-en Version: 1.0.0 The bundle is a comprehensive AI behavior benchmark (MayuBench) designed to evaluate model alignment and safety across 144 scenarios. While MayuBench_v1.0.md contains examples of prompt injection and requests for harmful activities (e.g., WiFi cracking in E14-1 or 'Ignore all instructions' in E18-1), these are explicitly presented as test data for benchmarking purposes rather than instructions for the agent to execute. The skill lacks any malicious code, data exfiltration logic, or unauthorized system access.

能力标签

cryptocan-make-purchases

能力评估

✓ Purpose & Capability

Name/description (behavior benchmark) matches the contents: question bank and scoring rubric are included, and no unrelated binaries, env vars, or installs are requested.

ℹ Instruction Scope

SKILL.md directs manual and automated evaluation using the included MayuBench_v1.0.md. The skill contains adversarial/prompt-injection-style test content (D3 includes 'injection prevention' scenarios) — the pre-scan flags for injection patterns are likely due to test questions intentionally containing adversarial prompts. The pseudocode for automated testing is referenced but not fully visible in the provided excerpt; verify that pseudocode does not instruct the agent to send sensitive data to external endpoints before running automated tests.

✓ Install Mechanism

No install spec, no code files, and no downloads — instruction-only skill with nothing written to disk by the skill itself.

✓ Credentials

No required environment variables, credentials, or config paths are declared; the skill does not ask for secrets or unrelated service tokens.

ℹ Persistence & Privilege

always:false (default) and user-invocable:true. The SKILL suggests an automated 'ClawFight Arena' mode that can 'automatically trigger MayuBench evaluation' — this is an instruction-level behavior, not a code-level service. Because the platform permits autonomous invocation by default, confirm agent runtime policies before allowing autonomous runs (especially for automated scoring), but this alone does not indicate incoherence.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install mayubench-en
安装完成后，直接呼叫该 Skill 的名称或使用 /mayubench-en 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of MayuBench v1.0 — an AI-native behavior benchmark. - Provides 144 scenario-based questions across 8 behavioral dimensions and a six-level scoring framework. - Includes manual, automated, and arena-based testing methods. - All documentation is now available in English. - Maintains full open-source licensing (MIT-0) and a standardized, reproducible structure.

元数据

Slug mayubench-en

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

mayubench-en 是什么？

AI-Native Behavior Benchmark — 48 scenarios × 3 difficulty levels = 144 questions, 8-dimension scoring, measuring whether AI should do things, not whether it... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 56 次。

如何安装 mayubench-en？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install mayubench-en」即可一键安装，无需额外配置。

mayubench-en 是免费的吗？

是的，mayubench-en 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

mayubench-en 支持哪些平台？

mayubench-en 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 mayubench-en？

由 wanyview1（@wanyview1）开发并维护，当前版本 v1.0.0。