← Back to Skills Marketplace

Skylv Prompt Evaluation

Name: Skylv Prompt Evaluation
Author: sky-lv

by SKY-lv · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install skylv-prompt-evaluation

Description

Evaluate and benchmark AI prompts for quality, consistency, and performance. Triggers: prompt evaluation, prompt testing, prompt quality, prompt benchmark, p...

README (SKILL.md)

Prompt Evaluation

Evaluate and benchmark AI prompts for quality, consistency, and performance. Score, compare, and optimize your prompts systematically.

Overview

A prompt evaluation framework that helps agents measure prompt quality across multiple dimensions: clarity, specificity, robustness, cost-efficiency, and output consistency. Compare prompt variants and find the optimal version.

Capabilities

1. Quality Scoring

node evaluate.js score --prompt "Summarize the article" --dimensions clarity,specificity,robustness
node evaluate.js score --prompt-file ./prompts/ --output scores.json

Scores prompts on clarity (0-10), specificity (0-10), robustness (0-10), and cost-efficiency (0-10).

2. A/B Comparison

node evaluate.js compare --prompt-a "Summarize" --prompt-b "Write a 3-bullet summary" --trials 50
node evaluate.js compare --config ab-test-config.json

Run statistical A/B tests between prompt variants with significance analysis.

3. Consistency Check

node evaluate.js consistency --prompt "Translate to French" --runs 100 --variance-threshold 0.15
node evaluate.js consistency --temperature 0.7 --top-p 0.9

Measures output consistency across multiple runs to find the most stable prompts.

4. Regression Testing

node evaluate.js regression --baseline v1.0 --current v1.1 --test-suite golden-set.jsonl
node evaluate.js regression --fail-on-degradation 5%

Detects quality regressions between prompt versions using golden test sets.

5. Cost Analysis

node evaluate.js cost --prompt "Long prompt..." --model gpt-4 --estimate-tokens
node evaluate.js cost --compare-prompts --output cost-report.csv

Estimates token usage and costs for different prompt variants and models.

Configuration

{
  "evaluation": {
    "dimensions": ["clarity", "specificity", "robustness", "cost"],
    "scoringModel": "gpt-4",
    "abTest": {
      "trials": 50,
      "significanceLevel": 0.05
    },
    "consistency": {
      "runs": 100,
      "varianceThreshold": 0.15
    },
    "regression": {
      "degradationThreshold": "5%",
      "goldenSet": "./golden-set.jsonl"
    }
  }
}

Use Cases

Prompt Engineering: Systematically improve prompt quality
Quality Assurance: Ensure prompts meet quality standards before production
Cost Optimization: Find prompts that achieve goals with fewer tokens
Version Control: Track prompt quality across versions
Agent Tuning: Optimize agent system prompts for consistency

Usage Guidance

This skill appears benign as an instruction-only prompt-evaluation description. Before installing or using it, be aware that the reviewed package does not include the `evaluate.js` code shown in examples, and do not run any external evaluator or submit confidential prompts until you know exactly what code and model provider will process them.

Capability Analysis

Type: OpenClaw Skill Name: skylv-prompt-evaluation Version: 1.0.0 The skill bundle describes a prompt evaluation framework for benchmarking AI prompts. The documentation in SKILL.md and metadata in _meta.json outline standard CLI-based functionalities such as quality scoring, A/B testing, and cost analysis. There are no indicators of malicious intent, prompt injection attacks, or suspicious behaviors in the provided files.

Capability Assessment

ℹ Purpose & Capability

The stated purpose—evaluating, comparing, and optimizing prompts—is coherent and low-risk, but the advertised CLI capabilities are not verifiable because the package contains only SKILL.md.

✓ Instruction Scope

The instructions are user-directed examples for scoring, comparison, consistency, regression, and cost analysis; they do not override user intent, demand hidden behavior, or encourage destructive actions.

ℹ Install Mechanism

There is no install spec and no code, yet the documentation shows commands such as `node evaluate.js`; users would need a separate implementation that was not included in the reviewed artifacts.

ℹ Credentials

The examples read prompt files and golden test sets and write reports, which is proportionate for prompt evaluation, but users should avoid running it on confidential prompts unless they understand where evaluations are processed.

✓ Persistence & Privilege

The artifacts declare no credentials, required environment variables, background services, persistent memory, privileged paths, or autonomous long-running behavior.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install skylv-prompt-evaluation
After installation, invoke the skill by name or use /skylv-prompt-evaluation
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of the prompt-evaluation skill. - Evaluate and benchmark AI prompts for clarity, specificity, robustness, and cost-efficiency. - Score prompts, compare variants with A/B tests, and measure output consistency. - Run regression testing to detect quality changes across prompt versions. - Estimate and compare token usage and cost for different prompts and models. - Designed for prompt engineering, quality assurance, and cost optimization.

Metadata

Slug skylv-prompt-evaluation

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Skylv Prompt Evaluation?

Evaluate and benchmark AI prompts for quality, consistency, and performance. Triggers: prompt evaluation, prompt testing, prompt quality, prompt benchmark, p... It is an AI Agent Skill for Claude Code / OpenClaw, with 47 downloads so far.

How do I install Skylv Prompt Evaluation?

Run "/install skylv-prompt-evaluation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Skylv Prompt Evaluation free?

Yes, Skylv Prompt Evaluation is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Skylv Prompt Evaluation support?

Skylv Prompt Evaluation is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Skylv Prompt Evaluation?

It is built and maintained by SKY-lv (@sky-lv); the current version is v1.0.0.

More Skills