← Back to Skills Marketplace
sky-lv

Skylv Prompt Evaluation

by SKY-lv · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
47
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install skylv-prompt-evaluation
Description
Evaluate and benchmark AI prompts for quality, consistency, and performance. Triggers: prompt evaluation, prompt testing, prompt quality, prompt benchmark, p...
README (SKILL.md)

Prompt Evaluation

Evaluate and benchmark AI prompts for quality, consistency, and performance. Score, compare, and optimize your prompts systematically.

Overview

A prompt evaluation framework that helps agents measure prompt quality across multiple dimensions: clarity, specificity, robustness, cost-efficiency, and output consistency. Compare prompt variants and find the optimal version.

Capabilities

1. Quality Scoring

node evaluate.js score --prompt "Summarize the article" --dimensions clarity,specificity,robustness
node evaluate.js score --prompt-file ./prompts/ --output scores.json

Scores prompts on clarity (0-10), specificity (0-10), robustness (0-10), and cost-efficiency (0-10).

2. A/B Comparison

node evaluate.js compare --prompt-a "Summarize" --prompt-b "Write a 3-bullet summary" --trials 50
node evaluate.js compare --config ab-test-config.json

Run statistical A/B tests between prompt variants with significance analysis.

3. Consistency Check

node evaluate.js consistency --prompt "Translate to French" --runs 100 --variance-threshold 0.15
node evaluate.js consistency --temperature 0.7 --top-p 0.9

Measures output consistency across multiple runs to find the most stable prompts.

4. Regression Testing

node evaluate.js regression --baseline v1.0 --current v1.1 --test-suite golden-set.jsonl
node evaluate.js regression --fail-on-degradation 5%

Detects quality regressions between prompt versions using golden test sets.

5. Cost Analysis

node evaluate.js cost --prompt "Long prompt..." --model gpt-4 --estimate-tokens
node evaluate.js cost --compare-prompts --output cost-report.csv

Estimates token usage and costs for different prompt variants and models.

Configuration

{
  "evaluation": {
    "dimensions": ["clarity", "specificity", "robustness", "cost"],
    "scoringModel": "gpt-4",
    "abTest": {
      "trials": 50,
      "significanceLevel": 0.05
    },
    "consistency": {
      "runs": 100,
      "varianceThreshold": 0.15
    },
    "regression": {
      "degradationThreshold": "5%",
      "goldenSet": "./golden-set.jsonl"
    }
  }
}

Use Cases

  • Prompt Engineering: Systematically improve prompt quality
  • Quality Assurance: Ensure prompts meet quality standards before production
  • Cost Optimization: Find prompts that achieve goals with fewer tokens
  • Version Control: Track prompt quality across versions
  • Agent Tuning: Optimize agent system prompts for consistency
Usage Guidance
This skill appears benign as an instruction-only prompt-evaluation description. Before installing or using it, be aware that the reviewed package does not include the `evaluate.js` code shown in examples, and do not run any external evaluator or submit confidential prompts until you know exactly what code and model provider will process them.
Capability Analysis
Type: OpenClaw Skill Name: skylv-prompt-evaluation Version: 1.0.0 The skill bundle describes a prompt evaluation framework for benchmarking AI prompts. The documentation in SKILL.md and metadata in _meta.json outline standard CLI-based functionalities such as quality scoring, A/B testing, and cost analysis. There are no indicators of malicious intent, prompt injection attacks, or suspicious behaviors in the provided files.
Capability Assessment
Purpose & Capability
The stated purpose—evaluating, comparing, and optimizing prompts—is coherent and low-risk, but the advertised CLI capabilities are not verifiable because the package contains only SKILL.md.
Instruction Scope
The instructions are user-directed examples for scoring, comparison, consistency, regression, and cost analysis; they do not override user intent, demand hidden behavior, or encourage destructive actions.
Install Mechanism
There is no install spec and no code, yet the documentation shows commands such as `node evaluate.js`; users would need a separate implementation that was not included in the reviewed artifacts.
Credentials
The examples read prompt files and golden test sets and write reports, which is proportionate for prompt evaluation, but users should avoid running it on confidential prompts unless they understand where evaluations are processed.
Persistence & Privilege
The artifacts declare no credentials, required environment variables, background services, persistent memory, privileged paths, or autonomous long-running behavior.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install skylv-prompt-evaluation
  3. After installation, invoke the skill by name or use /skylv-prompt-evaluation
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of the prompt-evaluation skill. - Evaluate and benchmark AI prompts for clarity, specificity, robustness, and cost-efficiency. - Score prompts, compare variants with A/B tests, and measure output consistency. - Run regression testing to detect quality changes across prompt versions. - Estimate and compare token usage and cost for different prompts and models. - Designed for prompt engineering, quality assurance, and cost optimization.
Metadata
Slug skylv-prompt-evaluation
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Skylv Prompt Evaluation?

Evaluate and benchmark AI prompts for quality, consistency, and performance. Triggers: prompt evaluation, prompt testing, prompt quality, prompt benchmark, p... It is an AI Agent Skill for Claude Code / OpenClaw, with 47 downloads so far.

How do I install Skylv Prompt Evaluation?

Run "/install skylv-prompt-evaluation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Skylv Prompt Evaluation free?

Yes, Skylv Prompt Evaluation is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Skylv Prompt Evaluation support?

Skylv Prompt Evaluation is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Skylv Prompt Evaluation?

It is built and maintained by SKY-lv (@sky-lv); the current version is v1.0.0.

💬 Comments