← 返回 Skills 市场

Agent Benchmark

Name: Agent Benchmark
Author: yuyonghao-123

作者 yuyonghao-123 · GitHub ↗ · v0.1.1 · MIT-0

cross-platform ⚠ suspicious

147

总下载

当前安装

版本数

在 OpenClaw 中安装

/install yuyonghao-agent-benchmark

功能描述

提供基于12项标准化任务的AI Agent能力评估，涵盖文件操作、数据处理、系统操作、健壮性和代码质量，自动评分生成报告。

安全使用建议

What to check before installing or running this skill: - Do not run the skill with sensitive environment variables present. The runner forwards your process.env into any executed task code and tasks can read files and env vars. - The SKILL.md instructs running a PowerShell runner that is not present in the package; instead a Node-based runner (index.js) is included — confirm which runner you intend to use and why documentation and code differ. - Review index.js and any task files (tasks.json, default-tasks.json, extended tasks) before execution. The runner will write task code to disk and execute it; untrusted tasks could perform arbitrary I/O or network activity. - Note the runner writes reports to '../../memory/benchmark-results.md' (outside the skill folder). If you don't want persistent copies in agent memory, modify the code or run in an isolated workspace. - Ensure required interpreters (python/node/go) are either intentionally available or intentionally absent. If you don't want the skill to execute arbitrary interpreters, run it in a sandbox without those binaries. - Ask the publisher (author/owner) for clarification: why does documentation reference PowerShell runner while code is Node-based, and why are some files (benchmark-runner.ps1) referenced but missing? If you cannot verify the source or intent, avoid running it on sensitive machines.

功能分析

Type: OpenClaw Skill Name: yuyonghao-agent-benchmark Version: 0.1.1 The skill bundle is a benchmarking tool for AI agents, providing a suite of tasks to evaluate capabilities in file operations, data processing, and system interaction. It includes a Node.js runner (index.js) for Python, Node.js, and Go tasks, and references a PowerShell-based runner for a larger set of system tasks. While the tool executes arbitrary code and accesses environment variables as part of its benchmarking process, these actions are consistent with its stated purpose, and no evidence of malicious intent, data exfiltration, or unauthorized persistence was found.

能力评估

⚠ Purpose & Capability

The README/SKILL.md describes a PowerShell-based 'benchmark-runner.ps1' and instructs running PowerShell scripts, but the package actually contains a Node CLI (index.js) and Node test harness. The SKILL metadata lists no required binaries, yet index.js spawns external runtimes ('python', 'node', 'go run'). The repository also ships multiple task sets (PowerShell-style tasks and Python tasks) so it's unclear which runner is authoritative. These inconsistencies mean the claimed purpose (PowerShell edition runner) doesn't fully align with the actual code and runtime assumptions.

⚠ Instruction Scope

SKILL.md tells users to run a local PowerShell runner path that is not present in the file manifest. The instructions encourage providing custom task files; index.js will write task code to disk and execute it with child processes, forwarding full process.env to children and allowing those tasks to read environment variables and the filesystem. The runtime behavior (create temp dirs, execute arbitrary code from tasks, and write reports) is broader than the missing/mismatched documentation implies and could run arbitrary user-supplied code.

ℹ Install Mechanism

There is no install spec (instruction-only), which is low-risk. However, the Node program expects external interpreters (python/node/go) to be available. Those required binaries are not declared in the skill metadata or SKILL.md, so runtime failures or hidden execution of local interpreters are possible if binaries exist. No remote downloads or unusual install steps are present.

⚠ Credentials

The skill requests no explicit credentials, but index.js passes the agent's full environment into spawned child processes and some default tasks read environment variables (e.g., task-011). The runner also writes reports to '../../memory/benchmark-results.md' (outside the skill folder), which persists data into an agent memory area. Executing arbitrary task code therefore has the ability to read environment variables and exfiltrate data if malicious task definitions are provided — this capability is proportional if you only run trusted tasks, but risky otherwise.

⚠ Persistence & Privilege

always: false (good), but index.js writes a generated report into a path two levels up ('../../memory/benchmark-results.md'), which is outside the skill directory and likely into the agent's persistent memory area. The skill therefore persists output in a global location without declaring that behavior in metadata or prompting the user.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install yuyonghao-agent-benchmark
安装完成后，直接呼叫该 Skill 的名称或使用 /yuyonghao-agent-benchmark 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.1

- 更新 package.json，版本号从 0.1.0 升级为 0.1.1 - 未对 SKILL.md 及核心功能文档进行修改 - 本次为 package 元数据小幅更新，无功能和文档变动

v0.1.0

AI Agent benchmark with 4 tests passing

元数据

Slug yuyonghao-agent-benchmark

版本 0.1.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

Agent Benchmark 是什么？

提供基于12项标准化任务的AI Agent能力评估，涵盖文件操作、数据处理、系统操作、健壮性和代码质量，自动评分生成报告。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 147 次。

如何安装 Agent Benchmark？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install yuyonghao-agent-benchmark」即可一键安装，无需额外配置。

Agent Benchmark 是免费的吗？

是的，Agent Benchmark 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Agent Benchmark 支持哪些平台？

Agent Benchmark 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Agent Benchmark？

由 yuyonghao-123（@yuyonghao-123）开发并维护，当前版本 v0.1.1。