← Back to Skills Marketplace
yh22e

Smartness Eval Open Source

by 圆规 · GitHub ↗ · v0.3.3 · MIT-0
cross-platform ✓ Security Clean
148
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install smartness-eval-open-source
Description
OpenClaw 智能度综合评伌技能。围绕 14 个维度(含规划能力、幻觉控制)输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。
Usage Guidance
What to check before installing or running: - Review scripts/eval.py and scripts/state_probe.py (the run-time engine) to confirm validate_command() actually blocks inline execution, absolute paths, path traversal, shell=True, and network unless --llm-judge is explicitly used. - Understand that the test commands call other workspace scripts (e.g., scripts/cognitive-kernel-v6.py, scripts/api-fallback-v5.py) which are NOT bundled here; inspect those external scripts (in your OpenClaw workspace) to ensure they don't read or send data beyond what you expect. - The skill will read many state files and a reasoning SQLite DB. Ensure those files do not contain secrets or sensitive user data you don't want evaluated tools to access. - The LLM-judge feature is opt-in and requires setting DEEPSEEK_API_KEY or OPENAI_API_KEY; do not set those env vars unless you accept sending the aggregated summary described in docs. - If you need stronger assurance, provide the full eval.py source (runtime portion that builds/validates/executes commands) for review; that would raise confidence to high.
Capability Analysis
Type: OpenClaw Skill Name: smartness-eval-open-source Version: 0.3.3 The skill bundle is a comprehensive evaluation and benchmarking framework for the OpenClaw agent. The core logic in `scripts/eval.py` collects system metrics from local workspace files (e.g., `state/error-tracker.json`, `.reasoning/reasoning-store.sqlite`) and executes functional tests defined in `config/task-suite.json`. While the skill executes subprocesses, it implements a `validate_command` function to mitigate risks by enforcing a whitelist (only `python3`), blocking inline code execution (`-c`, `exec(`), and preventing path traversal. External network communication is limited to an optional 'LLM Judge' feature that requires explicit user opt-in and API keys. The code and extensive documentation (README.md, SKILL.md) are consistent with the stated purpose of measuring agent performance across 14 dimensions.
Capability Assessment
Purpose & Capability
Name/description match what the package actually contains: a local evaluation framework that reads runtime state files and runs capability tests. The listed inputs (state/*.json, .reasoning/*.sqlite, logs) are reasonable data sources for an evaluation tool; the declared optional API keys are only for an opt-in LLM-judge feature. Dependencies on other OpenClaw core scripts (cognitive-kernel-v6.py, api-fallback-v5.py, etc.) are expected for in-workspace testing.
Instruction Scope
SKILL.md explicitly states the tool is read-only for specific state files and writes only under state/smartness-eval/. It also documents a validate_command() gate and that network access is off by default. This is coherent, however the test suite executes external workspace scripts (e.g., scripts/cognitive-kernel-v6.py) which are not bundled here — those scripts determine the ultimate runtime behavior (they could read additional files or call network endpoints). To be fully confident you should review eval.py (validate_command implementation) and the external test scripts that will be executed.
Install Mechanism
No install spec and no external downloads; the skill is instruction/code-only and runs from the workspace. That minimizes supply-chain risk from arbitrary downloads. The repo includes its own Python scripts and JSON configs; nothing in the manifest pulls code from external URLs.
Credentials
No required env vars, and the only optional credentials are DEEPSEEK_API_KEY or OPENAI_API_KEY for the explicitly opt-in --llm-judge feature. That is proportionate to the documented behavior. The skill does not request unrelated cloud credentials or broad tokens in its metadata.
Persistence & Privilege
always is false; the skill claims to only write outputs to state/smartness-eval/ and not modify config or other skills. This matches the manifest and docs. It does execute subprocesses (expected for tests) but does not request permanent installation or elevated platform privileges.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install smartness-eval-open-source
  3. After installation, invoke the skill by name or use /smartness-eval-open-source
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.3.3
- Added detailed SKILL.md including full usage guide, evaluation modes, output fields, data sources, and security declarations. - Documented all read/write file operations and command execution safeguards. - Outlined supported evaluation modes (quick, standard, deep) and their purposes. - Clarified new output fields (dimension_spread, trend_vs_last, pass_at_k, llm_judge). - Specified all data input sources and output file locations. - Added a Security Declaration section detailing only-read/only-write directories, limited command execution, and API key requirements for LLM Judge mode.
Metadata
Slug smartness-eval-open-source
Version 0.3.3
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Smartness Eval Open Source?

OpenClaw 智能度综合评伌技能。围绕 14 个维度(含规划能力、幻觉控制)输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。 It is an AI Agent Skill for Claude Code / OpenClaw, with 148 downloads so far.

How do I install Smartness Eval Open Source?

Run "/install smartness-eval-open-source" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Smartness Eval Open Source free?

Yes, Smartness Eval Open Source is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Smartness Eval Open Source support?

Smartness Eval Open Source is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Smartness Eval Open Source?

It is built and maintained by 圆规 (@yh22e); the current version is v0.3.3.

💬 Comments