← 返回 Skills 市场

Smartness Eval Open Source

Name: Smartness Eval Open Source
Author: yh22e

作者圆规 · GitHub ↗ · v0.3.3 · MIT-0

cross-platform ✓ 安全检测通过

148

总下载

当前安装

版本数

在 OpenClaw 中安装

/install smartness-eval-open-source

功能描述

OpenClaw 智能度综合评伌技能。围绕 14 个维度（含规划能力、幻觉控制）输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。

安全使用建议

What to check before installing or running: - Review scripts/eval.py and scripts/state_probe.py (the run-time engine) to confirm validate_command() actually blocks inline execution, absolute paths, path traversal, shell=True, and network unless --llm-judge is explicitly used. - Understand that the test commands call other workspace scripts (e.g., scripts/cognitive-kernel-v6.py, scripts/api-fallback-v5.py) which are NOT bundled here; inspect those external scripts (in your OpenClaw workspace) to ensure they don't read or send data beyond what you expect. - The skill will read many state files and a reasoning SQLite DB. Ensure those files do not contain secrets or sensitive user data you don't want evaluated tools to access. - The LLM-judge feature is opt-in and requires setting DEEPSEEK_API_KEY or OPENAI_API_KEY; do not set those env vars unless you accept sending the aggregated summary described in docs. - If you need stronger assurance, provide the full eval.py source (runtime portion that builds/validates/executes commands) for review; that would raise confidence to high.

功能分析

Type: OpenClaw Skill Name: smartness-eval-open-source Version: 0.3.3 The skill bundle is a comprehensive evaluation and benchmarking framework for the OpenClaw agent. The core logic in `scripts/eval.py` collects system metrics from local workspace files (e.g., `state/error-tracker.json`, `.reasoning/reasoning-store.sqlite`) and executes functional tests defined in `config/task-suite.json`. While the skill executes subprocesses, it implements a `validate_command` function to mitigate risks by enforcing a whitelist (only `python3`), blocking inline code execution (`-c`, `exec(`), and preventing path traversal. External network communication is limited to an optional 'LLM Judge' feature that requires explicit user opt-in and API keys. The code and extensive documentation (README.md, SKILL.md) are consistent with the stated purpose of measuring agent performance across 14 dimensions.

能力评估

✓ Purpose & Capability

Name/description match what the package actually contains: a local evaluation framework that reads runtime state files and runs capability tests. The listed inputs (state/*.json, .reasoning/*.sqlite, logs) are reasonable data sources for an evaluation tool; the declared optional API keys are only for an opt-in LLM-judge feature. Dependencies on other OpenClaw core scripts (cognitive-kernel-v6.py, api-fallback-v5.py, etc.) are expected for in-workspace testing.

ℹ Instruction Scope

SKILL.md explicitly states the tool is read-only for specific state files and writes only under state/smartness-eval/. It also documents a validate_command() gate and that network access is off by default. This is coherent, however the test suite executes external workspace scripts (e.g., scripts/cognitive-kernel-v6.py) which are not bundled here — those scripts determine the ultimate runtime behavior (they could read additional files or call network endpoints). To be fully confident you should review eval.py (validate_command implementation) and the external test scripts that will be executed.

✓ Install Mechanism

No install spec and no external downloads; the skill is instruction/code-only and runs from the workspace. That minimizes supply-chain risk from arbitrary downloads. The repo includes its own Python scripts and JSON configs; nothing in the manifest pulls code from external URLs.

✓ Credentials

No required env vars, and the only optional credentials are DEEPSEEK_API_KEY or OPENAI_API_KEY for the explicitly opt-in --llm-judge feature. That is proportionate to the documented behavior. The skill does not request unrelated cloud credentials or broad tokens in its metadata.

✓ Persistence & Privilege

always is false; the skill claims to only write outputs to state/smartness-eval/ and not modify config or other skills. This matches the manifest and docs. It does execute subprocesses (expected for tests) but does not request permanent installation or elevated platform privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install smartness-eval-open-source
安装完成后，直接呼叫该 Skill 的名称或使用 /smartness-eval-open-source 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.3.3

- Added detailed SKILL.md including full usage guide, evaluation modes, output fields, data sources, and security declarations. - Documented all read/write file operations and command execution safeguards. - Outlined supported evaluation modes (quick, standard, deep) and their purposes. - Clarified new output fields (dimension_spread, trend_vs_last, pass_at_k, llm_judge). - Specified all data input sources and output file locations. - Added a Security Declaration section detailing only-read/only-write directories, limited command execution, and API key requirements for LLM Judge mode.

元数据

Slug smartness-eval-open-source

版本 0.3.3

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Smartness Eval Open Source 是什么？

OpenClaw 智能度综合评伌技能。围绕 14 个维度（含规划能力、幻觉控制）输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 148 次。

如何安装 Smartness Eval Open Source？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install smartness-eval-open-source」即可一键安装，无需额外配置。

Smartness Eval Open Source 是免费的吗？

是的，Smartness Eval Open Source 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Smartness Eval Open Source 支持哪些平台？

Smartness Eval Open Source 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Smartness Eval Open Source？

由圆规（@yh22e）开发并维护，当前版本 v0.3.3。