← 返回 Skills 市场

Skill Eval

Name: Skill Eval
Author: xiaoxing9

作者 Xiaoxing9 · GitHub ↗ · v1.1.1 · MIT-0

cross-platform ⚠ suspicious

230

总下载

当前安装

版本数

在 OpenClaw 中安装

/install openclaw-skill-eval

功能描述

Skill evaluation framework. Use when: testing trigger rate, quality compare (with/without skill), or model comparison. Runs via sessions_spawn + sessions_his...

安全使用建议

This skill is internally consistent with its stated purpose, but take these precautions before running it: - Review SKILL.md and the bundled scripts (especially anything that writes files) so you understand what will be stored under eval-workspace/. - Don't run evaluations against skills or prompts that will surface sensitive credentials or personal data; persisted histories include tool calls and tool results and may capture secrets. - Because the workflow uses sandbox="inherit" and cleanup="keep", spawned subagents inherit the main agent environment and histories are retained — consider running in a disposable/test account or environment if you have any sensitive registrations or tokens available to your agent. - If you need to test locally, create a clean ~/.openclaw/openclaw.json or ensure skills.load.extraDirs points only to safe directories; the resolver reads that file to find skill paths. - The skill does not auto-install anything (fake-tool requires manual copy + gateway restart), and there are no remote downloads in SKILL.md — still, inspect any scripts before executing them in your environment. If you want to proceed safely: run evaluations on a non-production agent, delete eval-workspace/ after reviewing results, and avoid exposing real credentials during tests.

功能分析

Type: OpenClaw Skill Name: openclaw-skill-eval Version: 1.1.1 The bundle is a comprehensive evaluation framework that utilizes high-risk capabilities, including spawning autonomous sub-sessions via 'sessions_spawn' and executing local Python scripts through 'subprocess' and 'exec' (e.g., in scripts/legacy/run_orchestrator.py). While these behaviors are aligned with the stated purpose of benchmarking AI skills, the framework lacks input sanitization, creating a risk of Remote Code Execution (RCE) if the agent processes untrusted evaluation metadata. Additionally, viewer/generate_review.py starts a local HTTP server and uses 'os.kill' to manage ports, which is aggressive behavior for a skill bundle. No evidence of intentional data exfiltration or backdoors was found, but the broad permissions required for operation warrant a suspicious classification.

能力评估

✓ Purpose & Capability

Name/description match the included files and runtime instructions: the repository contains resolver and analysis scripts, example evals, and a SKILL.md that instructs the agent to spawn subagents and run local Python analysis. The requested actions (reading skill paths, running trigger/quality/model workflows, writing per-iteration workspaces) are coherent with an evaluation framework.

ℹ Instruction Scope

Runtime instructions explicitly tell the agent to read ~/.openclaw/openclaw.json to locate skill directories, call sessions_spawn and sessions_history, run local Python scripts via exec, and write full evaluation data to eval-workspace/. Those actions are expected for an eval tool, but they grant the skill access to user config and full conversation histories (including tool calls/results).

✓ Install Mechanism

No install spec is present (instruction-only skill). Scripts are bundled in the repo and meant to be run locally via exec. There are no remote downloads or installers referenced in SKILL.md; requirements.txt lists requests but analysis scripts are documented as offline. This is low install risk.

ℹ Credentials

The skill declares no required env vars or credentials (good). However, it reads ~/.openclaw/openclaw.json and requires subagents to use sandbox="inherit" in spawn calls, which means the spawned sessions may inherit the main agent's registration environment/skill context. While not an explicit credential request, this can expose the same runtime environment to subagents — the behavior is explainable by the tool's purpose but worth noting.

⚠ Persistence & Privilege

Workflows require cleanup="keep" and saving full_history.json / raw transcripts to eval-workspace/<skill>/iter-N/. Persisting full session histories (tool_use + tool_result) can retain sensitive data (API keys, tokens, user-provided secrets) if any eval touches them. Combined with sandbox="inherit", retained histories may contain environment-derived data. This is expected for an evaluation tool but represents a real privacy/storage risk that users must manage.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install openclaw-skill-eval
安装完成后，直接呼叫该 Skill 的名称或使用 /openclaw-skill-eval 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.1

Security: Add runtime actions disclosure, change fake-tool setup to manual (no auto gateway restart or skill install).

v1.1.0

v1.1: Negative trigger detection (precision, F1), scenario tiering (Tier 1 core / Tier 2 optional / Tier 3 roadmap), false positive diagnosis. 26 unit tests.

v1.0.0

Initial release: trigger rate detection (positive + negative), quality compare, description diagnosis, model comparison. 26 unit tests.

元数据

Slug openclaw-skill-eval

版本 1.1.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 3

常见问题

Skill Eval 是什么？

Skill evaluation framework. Use when: testing trigger rate, quality compare (with/without skill), or model comparison. Runs via sessions_spawn + sessions_his... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 230 次。

如何安装 Skill Eval？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install openclaw-skill-eval」即可一键安装，无需额外配置。

Skill Eval 是免费的吗？

是的，Skill Eval 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Skill Eval 支持哪些平台？

Skill Eval 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Skill Eval？

由 Xiaoxing9（@xiaoxing9）开发并维护，当前版本 v1.1.1。