← 返回 Skills 市场
xiaoxing9

Skill Eval

作者 Xiaoxing9 · GitHub ↗ · v1.1.1 · MIT-0
cross-platform ⚠ suspicious
230
总下载
0
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install openclaw-skill-eval
功能描述
Skill evaluation framework. Use when: testing trigger rate, quality compare (with/without skill), or model comparison. Runs via sessions_spawn + sessions_his...
安全使用建议
This skill is internally consistent with its stated purpose, but take these precautions before running it: - Review SKILL.md and the bundled scripts (especially anything that writes files) so you understand what will be stored under eval-workspace/. - Don't run evaluations against skills or prompts that will surface sensitive credentials or personal data; persisted histories include tool calls and tool results and may capture secrets. - Because the workflow uses sandbox="inherit" and cleanup="keep", spawned subagents inherit the main agent environment and histories are retained — consider running in a disposable/test account or environment if you have any sensitive registrations or tokens available to your agent. - If you need to test locally, create a clean ~/.openclaw/openclaw.json or ensure skills.load.extraDirs points only to safe directories; the resolver reads that file to find skill paths. - The skill does not auto-install anything (fake-tool requires manual copy + gateway restart), and there are no remote downloads in SKILL.md — still, inspect any scripts before executing them in your environment. If you want to proceed safely: run evaluations on a non-production agent, delete eval-workspace/ after reviewing results, and avoid exposing real credentials during tests.
功能分析
Type: OpenClaw Skill Name: openclaw-skill-eval Version: 1.1.1 The bundle is a comprehensive evaluation framework that utilizes high-risk capabilities, including spawning autonomous sub-sessions via 'sessions_spawn' and executing local Python scripts through 'subprocess' and 'exec' (e.g., in scripts/legacy/run_orchestrator.py). While these behaviors are aligned with the stated purpose of benchmarking AI skills, the framework lacks input sanitization, creating a risk of Remote Code Execution (RCE) if the agent processes untrusted evaluation metadata. Additionally, viewer/generate_review.py starts a local HTTP server and uses 'os.kill' to manage ports, which is aggressive behavior for a skill bundle. No evidence of intentional data exfiltration or backdoors was found, but the broad permissions required for operation warrant a suspicious classification.
能力评估
Purpose & Capability
Name/description match the included files and runtime instructions: the repository contains resolver and analysis scripts, example evals, and a SKILL.md that instructs the agent to spawn subagents and run local Python analysis. The requested actions (reading skill paths, running trigger/quality/model workflows, writing per-iteration workspaces) are coherent with an evaluation framework.
Instruction Scope
Runtime instructions explicitly tell the agent to read ~/.openclaw/openclaw.json to locate skill directories, call sessions_spawn and sessions_history, run local Python scripts via exec, and write full evaluation data to eval-workspace/. Those actions are expected for an eval tool, but they grant the skill access to user config and full conversation histories (including tool calls/results).
Install Mechanism
No install spec is present (instruction-only skill). Scripts are bundled in the repo and meant to be run locally via exec. There are no remote downloads or installers referenced in SKILL.md; requirements.txt lists requests but analysis scripts are documented as offline. This is low install risk.
Credentials
The skill declares no required env vars or credentials (good). However, it reads ~/.openclaw/openclaw.json and requires subagents to use sandbox="inherit" in spawn calls, which means the spawned sessions may inherit the main agent's registration environment/skill context. While not an explicit credential request, this can expose the same runtime environment to subagents — the behavior is explainable by the tool's purpose but worth noting.
Persistence & Privilege
Workflows require cleanup="keep" and saving full_history.json / raw transcripts to eval-workspace/<skill>/iter-N/. Persisting full session histories (tool_use + tool_result) can retain sensitive data (API keys, tokens, user-provided secrets) if any eval touches them. Combined with sandbox="inherit", retained histories may contain environment-derived data. This is expected for an evaluation tool but represents a real privacy/storage risk that users must manage.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install openclaw-skill-eval
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /openclaw-skill-eval 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.1
Security: Add runtime actions disclosure, change fake-tool setup to manual (no auto gateway restart or skill install).
v1.1.0
v1.1: Negative trigger detection (precision, F1), scenario tiering (Tier 1 core / Tier 2 optional / Tier 3 roadmap), false positive diagnosis. 26 unit tests.
v1.0.0
Initial release: trigger rate detection (positive + negative), quality compare, description diagnosis, model comparison. 26 unit tests.
元数据
Slug openclaw-skill-eval
版本 1.1.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

Skill Eval 是什么?

Skill evaluation framework. Use when: testing trigger rate, quality compare (with/without skill), or model comparison. Runs via sessions_spawn + sessions_his... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 230 次。

如何安装 Skill Eval?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install openclaw-skill-eval」即可一键安装,无需额外配置。

Skill Eval 是免费的吗?

是的,Skill Eval 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Skill Eval 支持哪些平台?

Skill Eval 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Skill Eval?

由 Xiaoxing9(@xiaoxing9)开发并维护,当前版本 v1.1.1。

💬 留言讨论