← 返回 Skills 市场

OpenClaw Smartness Eval

Name: OpenClaw Smartness Eval
Author: yh22e

作者圆规 · GitHub ↗ · v0.3.2 · MIT-0

cross-platform ⚠ suspicious

319

总下载

当前安装

版本数

在 OpenClaw 中安装

/install openclaw-smartness-eval

功能描述

OpenClaw 智能度综合评伌技能。围绕 14 个维度（含规划能力、幻觉控制）输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。

安全使用建议

This skill is broadly coherent with its stated purpose (a workspace-centered evaluation tool), but before installing: 1) Review eval.py (especially validate_command(), subprocess.run usage, and any code paths that could enable network or write outside its stated output dir). 2) Confirm you trust the other workspace scripts it will invoke (many test commands call scripts that are not bundled). Those external scripts could read secrets or make network calls. 3) Treat the .reasoning/reasoning-store.sqlite and state logs as sensitive — if you don't want those inspected, do not install or run the skill. 4) If you enable --llm-judge, confirm exactly what summary fields are sent and test in an isolated environment first. 5) As a safe practice, run python3 scripts/check.py and then run eval.py in a sandboxed workspace (or with --no-probes / dry-run options) to observe behavior before granting it unfettered access or allowing autonomous invocations.

功能分析

Type: OpenClaw Skill Name: openclaw-smartness-eval Version: 0.3.2 The openclaw-smartness-eval bundle is a comprehensive framework designed to measure AI agent performance across 14 dimensions. While the core logic in `scripts/eval.py` utilizes `subprocess` to execute test commands, it implements a robust security gate (`validate_command`) that enforces a whitelist of allowed path prefixes, restricts the interpreter to `python3`, and explicitly blocks inline code execution (`-c`, `exec`), absolute paths, and path traversal. Data collection is limited to local workspace state files and logs, and the optional network access for LLM-based scoring is documented and requires user-provided API keys. The bundle demonstrates high transparency and security-conscious design, including anti-gaming probes and integrity checks.

能力评估

ℹ Purpose & Capability

The skill claims to produce a 14‑dimension evaluation and to read runtime state/logs; the commands and listed state files align with that purpose. However many task commands reference other workspace scripts (message-analyzer-v5.py, security-config-audit.py, etc.) that are not bundled with the skill and thus require a full OpenClaw environment. This dependency-on-host-scripts is plausible but should be noted by installers.

⚠ Instruction Scope

SKILL.md and docs state the tool is read-only (reads many state/*.json and .reasoning/reasoning-store.sqlite) and only writes to state/smartness-eval/. The runtime also spawns subprocesses to run tests. While the manifest claims a validate_command() gate, executing other workspace scripts (via allowed prefixes like 'scripts/') can cause those scripts to read network, secrets, or modify state — the skill's safety depends on both its validate_command implementation and trustworthiness of other workspace scripts. Verify validate_command and inspect eval.py before granting execution privileges.

✓ Install Mechanism

No external install spec or remote downloads; the package is instruction/code-only and uses only bundled Python scripts. This is low-risk from supply-chain/download perspective.

ℹ Credentials

The skill declares no required env vars; optional LLM judge requires DEEPSEEK_API_KEY or OPENAI_API_KEY only when explicitly enabled. However it reads potentially sensitive local artifacts (.reasoning/reasoning-store.sqlite, message-analyzer logs, etc.). Those reads are coherent for an evaluator but are sensitive — ensure you are comfortable exposing the reasoning DB and logs to the skill runtime.

ℹ Persistence & Privilege

always:false and docs state it writes only to its own state/smartness-eval/ directory. Autonomous invocation is enabled by default (platform normal). Combined with the skill's read access to internal logs and ability to run workspace scripts, autonomous invocation increases blast radius — consider whether you want the agent to be able to run this skill without manual approval each run.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install openclaw-smartness-eval
安装完成后，直接呼叫该 Skill 的名称或使用 /openclaw-smartness-eval 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.3.2

v0.3.2: 添加完整安全声明(Security Declaration)，明确声明只读文件列表、写入范围、命令白名单机制、网络访问策略、无副作用保证，降低ClawHub可疑标记

v0.3.1

fix: 修复所有 Markdown 文件的 CDATA 标签导致渲染异常

v0.3.0

v0.3.0: 新增规划能力和幻觉控制维度(12→14维度)，修复全部评分公式归一化，扩展反作弊探针(7→15)，新增6项测试(28→34)，对齐CLEAR/T-Eval/Anthropic行业标准

v0.2.1

Version 0.2.1 - Added scripts/state_probe.py for new state probing and reliability checks. - Made minor updates to existing scripts and configuration files for improved robustness. - Documented that LLM Judge option now only triggers external API calls when explicitly enabled with --llm-judge. - No breaking changes; all previous usage and report formats remain supported.

v0.2.0

v0.2.0: 12维度独立评分公式, 28项测试, 多数据源融合, LLM Judge, pass@k, 反作弊探针

元数据

Slug openclaw-smartness-eval

版本 0.3.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 5

常见问题

OpenClaw Smartness Eval 是什么？

OpenClaw 智能度综合评伌技能。围绕 14 个维度（含规划能力、幻觉控制）输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 319 次。

如何安装 OpenClaw Smartness Eval？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install openclaw-smartness-eval」即可一键安装，无需额外配置。

OpenClaw Smartness Eval 是免费的吗？

是的，OpenClaw Smartness Eval 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

OpenClaw Smartness Eval 支持哪些平台？

OpenClaw Smartness Eval 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 OpenClaw Smartness Eval？

由圆规（@yh22e）开发并维护，当前版本 v0.3.2。