← Back to Skills Marketplace

prompt-eval

Name: prompt-eval
Author: rivin-dong

by Rivin-Dong · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ Security Clean

203

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install prompt-eval

Description

Evaluate and optimize any AI prompt (`prompt_a`) with a 6-step pipeline: test plan, ~50 test cases, prompt execution, evaluator prompt (`prompt_b`), automate...

Usage Guidance

This skill is internally consistent with its purpose: it is an instruction-only prompt-evaluation pipeline that asks for no credentials and performs no installs. Before installing or running it, review the SKILL.md and the included evaluator templates (references/prompt_b_guide.md) because they contain role-style instructions and example evaluator prompts that could influence an agent's behavior. If you'll evaluate prompts that contain secrets or highly sensitive data, run the skill first on innocuous prompts to confirm outputs, and consider running it with autonomous invocation disabled (or restrict the agent's ability to call the skill) until you're comfortable. Also confirm you are okay with the skill writing results to ./prompt-eval-results/ or specify a different output directory.

Capability Analysis

Type: OpenClaw Skill Name: prompt-eval Version: 1.0.1 The `prompt-eval` skill is a comprehensive framework for benchmarking and optimizing AI prompts through a multi-step pipeline involving test generation, execution via subagents, and automated scoring. It includes detailed instructions in `SKILL.md` for safety evaluations (e.g., checking for prohibited content and prompt injection) and provides transparent reporting via CSV and JSON files as documented in `references/json_schema.md` and `references/prompt_b_guide.md`. No indicators of data exfiltration, malicious execution, or unauthorized persistence were found; the logic is entirely consistent with its stated purpose of prompt quality assurance and includes user-confirmation gates for safety.

Capability Tags

cryptocan-make-purchases

Capability Assessment

✓ Purpose & Capability

Name/description describe a prompt-evaluation pipeline and the SKILL.md plus reference docs fully implement that pipeline (test-plan generation, test-case templates, evaluator prompt, CSV/JSON output conventions). The skill asks for no unrelated binaries, env vars, or config paths — nothing requested is disproportionate to prompt-evaluation.

ℹ Instruction Scope

The SKILL.md gives detailed runtime instructions (generate ~50 test cases, run prompt_a, score outputs, write CSV/JSON to ./prompt-eval-results/). Those actions are appropriate for the stated purpose. It does instruct the agent to write files to a local folder and to persist results; this is expected for a QA pipeline but users should expect local disk writes. A prompt-injection pattern ('you-are-now') was detected in the SKILL.md — the skill contains embedded role-style instructions/examples that could attempt to influence model behavior; this is not inherently inconsistent with an evaluator (which often includes role templates), but it is worth reviewing before use, especially if you will evaluate sensitive prompts.

✓ Install Mechanism

Instruction-only skill with no install spec and no code files. Lowest-risk install profile: nothing is downloaded or written by an installer beyond what the agent itself may write during normal operation.

✓ Credentials

The skill declares no required environment variables, no credentials, and no config paths. That is appropriate for a purely instruction-based prompt-evaluation tool and matches its described functionality.

✓ Persistence & Privilege

always:false and no special privileges are requested. The skill writes its own results directory (./prompt-eval-results/), which is reasonable for an evaluation tool. The skill does not modify other skills or request system-wide configuration changes.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install prompt-eval
After installation, invoke the skill by name or use /prompt-eval
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

**Changelog for prompt-eval v1.0.1:** Pipeline upgraded to 6 steps: added an optimization-validation loop after scoring, moving from “evaluate + suggest” to “evaluate -> optimize -> validate -> finalize”. New Step 6 — Prompt_A Optimization Loop: generate evidence-backed change_id edits, build prompt_a_candidate, run 15-20 case validation subset, apply gates, allow max one extra iteration, output prompt_a_final. Final Report expanded to 6 sections: Section 4 changed to prompt_a_candidate (not final yet) Added Section 5 Iteration Validation (baseline vs candidate + gate results) Added Section 6 Final Deliverable Prompt (prompt_a_final + traceability table) Validation gates added: require P0-related score=1 to be zero, core TP avg improvement, pass-rate lift, and no safety regression (when safety TP exists). New output artifacts: prompt-eval-results/prompt_change_spec.csv prompt-eval-results/prompt_iteration_summary.csv prompt-eval-results/prompt_a_final.txt Metadata updated: frontmatter description now reflects optimization loop + final validated prompt output, and was shortened to satisfy validator limits. Validation status: skill passes quick_validate.py after installing PyYAML in an isolated virtual environment.

v1.0.0

**prompt-eval v1.0.0 Changelog** - Initial release of the skill. - Automatically evaluates and scores any AI prompt using a structured 5-step pipeline. - Covers both quantitative (format, logic, rules) and qualitative (engagement, persuasiveness, appeal) evaluation. - Mandatory safety evaluation included for every run. - Runs 200+ test cases per evaluation; outputs results as CSV files plus a final report with actionable insights. - Designed for prompt benchmarking, quality measurement, test case generation, and automated prompt QA.

Metadata

Slug prompt-eval

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is prompt-eval?

Evaluate and optimize any AI prompt (`prompt_a`) with a 6-step pipeline: test plan, ~50 test cases, prompt execution, evaluator prompt (`prompt_b`), automate... It is an AI Agent Skill for Claude Code / OpenClaw, with 203 downloads so far.

How do I install prompt-eval?

Run "/install prompt-eval" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is prompt-eval free?

Yes, prompt-eval is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does prompt-eval support?

prompt-eval is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created prompt-eval?

It is built and maintained by Rivin-Dong (@rivin-dong); the current version is v1.0.1.

More Skills