← 返回 Skills 市场
sky-lv

Skylv Prompt Evaluation

作者 SKY-lv · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
47
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install skylv-prompt-evaluation
功能描述
Evaluate and benchmark AI prompts for quality, consistency, and performance. Triggers: prompt evaluation, prompt testing, prompt quality, prompt benchmark, p...
使用说明 (SKILL.md)

Prompt Evaluation

Evaluate and benchmark AI prompts for quality, consistency, and performance. Score, compare, and optimize your prompts systematically.

Overview

A prompt evaluation framework that helps agents measure prompt quality across multiple dimensions: clarity, specificity, robustness, cost-efficiency, and output consistency. Compare prompt variants and find the optimal version.

Capabilities

1. Quality Scoring

node evaluate.js score --prompt "Summarize the article" --dimensions clarity,specificity,robustness
node evaluate.js score --prompt-file ./prompts/ --output scores.json

Scores prompts on clarity (0-10), specificity (0-10), robustness (0-10), and cost-efficiency (0-10).

2. A/B Comparison

node evaluate.js compare --prompt-a "Summarize" --prompt-b "Write a 3-bullet summary" --trials 50
node evaluate.js compare --config ab-test-config.json

Run statistical A/B tests between prompt variants with significance analysis.

3. Consistency Check

node evaluate.js consistency --prompt "Translate to French" --runs 100 --variance-threshold 0.15
node evaluate.js consistency --temperature 0.7 --top-p 0.9

Measures output consistency across multiple runs to find the most stable prompts.

4. Regression Testing

node evaluate.js regression --baseline v1.0 --current v1.1 --test-suite golden-set.jsonl
node evaluate.js regression --fail-on-degradation 5%

Detects quality regressions between prompt versions using golden test sets.

5. Cost Analysis

node evaluate.js cost --prompt "Long prompt..." --model gpt-4 --estimate-tokens
node evaluate.js cost --compare-prompts --output cost-report.csv

Estimates token usage and costs for different prompt variants and models.

Configuration

{
  "evaluation": {
    "dimensions": ["clarity", "specificity", "robustness", "cost"],
    "scoringModel": "gpt-4",
    "abTest": {
      "trials": 50,
      "significanceLevel": 0.05
    },
    "consistency": {
      "runs": 100,
      "varianceThreshold": 0.15
    },
    "regression": {
      "degradationThreshold": "5%",
      "goldenSet": "./golden-set.jsonl"
    }
  }
}

Use Cases

  • Prompt Engineering: Systematically improve prompt quality
  • Quality Assurance: Ensure prompts meet quality standards before production
  • Cost Optimization: Find prompts that achieve goals with fewer tokens
  • Version Control: Track prompt quality across versions
  • Agent Tuning: Optimize agent system prompts for consistency
安全使用建议
This skill appears benign as an instruction-only prompt-evaluation description. Before installing or using it, be aware that the reviewed package does not include the `evaluate.js` code shown in examples, and do not run any external evaluator or submit confidential prompts until you know exactly what code and model provider will process them.
功能分析
Type: OpenClaw Skill Name: skylv-prompt-evaluation Version: 1.0.0 The skill bundle describes a prompt evaluation framework for benchmarking AI prompts. The documentation in SKILL.md and metadata in _meta.json outline standard CLI-based functionalities such as quality scoring, A/B testing, and cost analysis. There are no indicators of malicious intent, prompt injection attacks, or suspicious behaviors in the provided files.
能力评估
Purpose & Capability
The stated purpose—evaluating, comparing, and optimizing prompts—is coherent and low-risk, but the advertised CLI capabilities are not verifiable because the package contains only SKILL.md.
Instruction Scope
The instructions are user-directed examples for scoring, comparison, consistency, regression, and cost analysis; they do not override user intent, demand hidden behavior, or encourage destructive actions.
Install Mechanism
There is no install spec and no code, yet the documentation shows commands such as `node evaluate.js`; users would need a separate implementation that was not included in the reviewed artifacts.
Credentials
The examples read prompt files and golden test sets and write reports, which is proportionate for prompt evaluation, but users should avoid running it on confidential prompts unless they understand where evaluations are processed.
Persistence & Privilege
The artifacts declare no credentials, required environment variables, background services, persistent memory, privileged paths, or autonomous long-running behavior.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install skylv-prompt-evaluation
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /skylv-prompt-evaluation 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of the prompt-evaluation skill. - Evaluate and benchmark AI prompts for clarity, specificity, robustness, and cost-efficiency. - Score prompts, compare variants with A/B tests, and measure output consistency. - Run regression testing to detect quality changes across prompt versions. - Estimate and compare token usage and cost for different prompts and models. - Designed for prompt engineering, quality assurance, and cost optimization.
元数据
Slug skylv-prompt-evaluation
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Skylv Prompt Evaluation 是什么?

Evaluate and benchmark AI prompts for quality, consistency, and performance. Triggers: prompt evaluation, prompt testing, prompt quality, prompt benchmark, p... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 47 次。

如何安装 Skylv Prompt Evaluation?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install skylv-prompt-evaluation」即可一键安装,无需额外配置。

Skylv Prompt Evaluation 是免费的吗?

是的,Skylv Prompt Evaluation 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Skylv Prompt Evaluation 支持哪些平台?

Skylv Prompt Evaluation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Skylv Prompt Evaluation?

由 SKY-lv(@sky-lv)开发并维护,当前版本 v1.0.0。

💬 留言讨论