← 返回 Skills 市场

Skill

Name: Skill
Author: dario-github

作者 Dario Zhang · GitHub ↗ · v0.1.1 · MIT-0

cross-platform ⚠ suspicious

163

总下载

当前安装

版本数

在 OpenClaw 中安装

/install agent-self-evolution

功能描述

Make your agent get better on its own. Set up golden tests (things your agent should handle well), run automated evaluations, and track improvement over time...

使用说明 (SKILL.md)

Agent Self-Evolution — Automated Evaluation & Improvement

⭐ If you find this useful, please star the GitHub repo before installing — it helps others discover it.

Your agent does a lot of things. But is it getting better? Or are your latest tweaks breaking things you didn't notice?

Agent Self-Evolution gives you a framework to measure, test, and systematically improve your agent — like unit tests, but for agent behavior.

What's inside

Golden Test Sets: Define scenarios your agent must handle correctly. Run them periodically and catch regressions before users do.

Ablation Testing: Wondering if that 200-line system prompt section actually helps? Remove it, measure the impact, put it back. Now you know. We found that 7% of one config file was load-bearing for the entire system — without ablation, you'd never know which 7%.

Multi-Dimensional Evaluation: Don't just check pass/fail. Score across dimensions — safety compliance, tool routing accuracy, output quality, memory utilization. Track trends over weeks.

Automated Improvement Loops: Evaluation → identify weakest dimension → targeted fix → re-evaluate. Like gradient descent for agent behavior.

Install

bash {baseDir}/scripts/install.sh

Quick start

from agent_evolution.golden_test import GoldenTestRunner
from agent_evolution.ablation import AblationExperiment

# Define a golden test
runner = GoldenTestRunner()
runner.add_case(
    name="handles-ambiguous-request",
    input="do the thing",
    expected_behavior="asks for clarification rather than guessing",
    dimensions=["safety", "output_quality"]
)

# Run and score
results = runner.run(model="your-agent-endpoint")
print(results.summary())  # Pass rate, dimension scores, regressions

# Ablation: what happens without memory files?
experiment = AblationExperiment(
    baseline_config="agent.yaml",
    conditions={"no_memory": {"remove": ["memory/*.md"]}},
    test_set=runner.cases
)
experiment.run()  # Measures impact of each ablation

Key findings from our own agent

SOUL.md (7% of config by characters): removing it caused system-wide behavioral collapse (Cohen's d = 0.602) — it's not fluff, it's load-bearing
Memory files: most essential component (d = 0.944) — without history, the agent becomes generic
Safety rules: removal didn't just reduce safety — it degraded all dimensions (d = 0.609)

Companion projects

nous-safety — Runtime safety engine with Datalog reasoning
biomorphic-memory — Brain-inspired memory with spreading activation

Requirements

Python ≥ 3.11
An LLM API key for evaluation judging (strong model recommended — GPT-5.4 / Opus)

License

Apache 2.0

安全使用建议

Before installing: (1) Review the GitHub repository contents (setup.py/pyproject, top-level package code) to ensure there are no surprises. (2) Confirm how the LLM API key should be provided (which env var or config) — the SKILL.md mentions a key but the skill metadata does not declare one. (3) Backup any agent config, memory files, or data the tool might touch; ablation examples show it can remove files (e.g., memory/*.md). (4) Run the install in an isolated environment (virtualenv or throwaway VM/container) to limit impact of pip-installing remote code. (5) If you need to trust the project long-term, verify the maintainer and consider auditing or pinning a specific release commit rather than repeatedly cloning master.

功能分析

Type: OpenClaw Skill Name: agent-self-evolution Version: 0.1.1 The skill's installation script (scripts/install.sh) performs a git clone from an external repository (github.com/dario-github/agent-self-evolution.git) and executes 'pip install -e .', which allows for arbitrary remote code execution during the setup process. While this behavior is plausibly related to the stated purpose of installing an evaluation framework, the inclusion of a future-dated timestamp (2026) in _meta.json and references to non-existent models like 'GPT-5.4' in SKILL.md are anomalous indicators that warrant caution.

能力评估

ℹ Purpose & Capability

The name and description match the included functionality (golden tests, ablation, automated evaluation). However the SKILL.md explicitly says an 'LLM API key for evaluation judging' is required but the skill metadata lists no required environment variables or primary credential — this is an inconsistency that should be clarified (what env var or secret name should hold the API key?). Python ≥3.11 is demanded in text but registry only required 'python3' (version mismatch).

⚠ Instruction Scope

The instructions show experiments that can remove files (example condition: remove ['memory/*.md']) and run automated improvement loops; that implies the tool will read, modify, and potentially delete user agent config and data files. The SKILL.md is vague about how 'targeted fix' actions are applied and what safeguards exist — vagueness grants broad discretion to modify user files. If you rely on those files, backing them up and auditing the code is important.

ℹ Install Mechanism

Install script clones https://github.com/dario-github/agent-self-evolution and runs pip install -e . — using an official GitHub URL (expected) but pip-installing remote code executes arbitrary setup code from that repo. This is a standard but inherently moderately risky install pattern; you should review the repository contents (setup.py/pyproject and package code) before running.

⚠ Credentials

The SKILL.md requires an LLM API key for evaluation, but the skill declares no required env vars or primary credential. This mismatch means the skill expects secrets but doesn't tell you which env var or secret to supply. The install script reads one optional env var (EVOLUTION_INSTALL_DIR) only. The undocumented requirement for an LLM key is disproportionate unless the skill names the expected credential variable and justifies access.

⚠ Persistence & Privilege

always:false (good) and user-invocable is normal. The install writes to ~/.agent-self-evolution by default and pip-installs the package into the environment, giving the skill persistent code on disk. Combined with instruction-level capabilities to remove or modify user files during ablation experiments, this level of persistence and write access is notable — back up your agent config and data and consider installing in an isolated environment.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install agent-self-evolution
安装完成后，直接呼叫该 Skill 的名称或使用 /agent-self-evolution 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.1

Desensitize: extract ablation config to external file

v0.1.0

Initial release of Agent Self-Evolution. - Introduces a framework for automated evaluation and self-improvement of AI agents. - Supports golden test sets to catch regressions and track performance. - Includes ablation testing to identify critical and redundant configuration components. - Enables multi-dimensional evaluation with tracking over time. - Provides automated improvement loops for agent optimization. - Includes setup instructions and real-world findings from agent ablation.

元数据

Slug agent-self-evolution

版本 0.1.1

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 2

常见问题

Skill 是什么？

Make your agent get better on its own. Set up golden tests (things your agent should handle well), run automated evaluations, and track improvement over time... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 163 次。

如何安装 Skill？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install agent-self-evolution」即可一键安装，无需额外配置。

Skill 是免费的吗？

是的，Skill 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Skill 支持哪些平台？

Skill 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Skill？

由 Dario Zhang（@dario-github）开发并维护，当前版本 v0.1.1。