← 返回 Skills 市场
prompt-eval
作者
Rivin-Dong
· GitHub ↗
· v1.0.1
· MIT-0
203
总下载
3
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install prompt-eval
功能描述
Evaluate and optimize any AI prompt (`prompt_a`) with a 6-step pipeline: test plan, ~50 test cases, prompt execution, evaluator prompt (`prompt_b`), automate...
安全使用建议
This skill is internally consistent with its purpose: it is an instruction-only prompt-evaluation pipeline that asks for no credentials and performs no installs. Before installing or running it, review the SKILL.md and the included evaluator templates (references/prompt_b_guide.md) because they contain role-style instructions and example evaluator prompts that could influence an agent's behavior. If you'll evaluate prompts that contain secrets or highly sensitive data, run the skill first on innocuous prompts to confirm outputs, and consider running it with autonomous invocation disabled (or restrict the agent's ability to call the skill) until you're comfortable. Also confirm you are okay with the skill writing results to ./prompt-eval-results/ or specify a different output directory.
功能分析
Type: OpenClaw Skill
Name: prompt-eval
Version: 1.0.1
The `prompt-eval` skill is a comprehensive framework for benchmarking and optimizing AI prompts through a multi-step pipeline involving test generation, execution via subagents, and automated scoring. It includes detailed instructions in `SKILL.md` for safety evaluations (e.g., checking for prohibited content and prompt injection) and provides transparent reporting via CSV and JSON files as documented in `references/json_schema.md` and `references/prompt_b_guide.md`. No indicators of data exfiltration, malicious execution, or unauthorized persistence were found; the logic is entirely consistent with its stated purpose of prompt quality assurance and includes user-confirmation gates for safety.
能力标签
能力评估
Purpose & Capability
Name/description describe a prompt-evaluation pipeline and the SKILL.md plus reference docs fully implement that pipeline (test-plan generation, test-case templates, evaluator prompt, CSV/JSON output conventions). The skill asks for no unrelated binaries, env vars, or config paths — nothing requested is disproportionate to prompt-evaluation.
Instruction Scope
The SKILL.md gives detailed runtime instructions (generate ~50 test cases, run prompt_a, score outputs, write CSV/JSON to ./prompt-eval-results/). Those actions are appropriate for the stated purpose. It does instruct the agent to write files to a local folder and to persist results; this is expected for a QA pipeline but users should expect local disk writes. A prompt-injection pattern ('you-are-now') was detected in the SKILL.md — the skill contains embedded role-style instructions/examples that could attempt to influence model behavior; this is not inherently inconsistent with an evaluator (which often includes role templates), but it is worth reviewing before use, especially if you will evaluate sensitive prompts.
Install Mechanism
Instruction-only skill with no install spec and no code files. Lowest-risk install profile: nothing is downloaded or written by an installer beyond what the agent itself may write during normal operation.
Credentials
The skill declares no required environment variables, no credentials, and no config paths. That is appropriate for a purely instruction-based prompt-evaluation tool and matches its described functionality.
Persistence & Privilege
always:false and no special privileges are requested. The skill writes its own results directory (./prompt-eval-results/), which is reasonable for an evaluation tool. The skill does not modify other skills or request system-wide configuration changes.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install prompt-eval - 安装完成后,直接呼叫该 Skill 的名称或使用
/prompt-eval触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
**Changelog for prompt-eval v1.0.1:**
Pipeline upgraded to 6 steps: added an optimization-validation loop after scoring, moving from “evaluate + suggest” to “evaluate -> optimize -> validate -> finalize”.
New Step 6 — Prompt_A Optimization Loop: generate evidence-backed change_id edits, build prompt_a_candidate, run 15-20 case validation subset, apply gates, allow max one extra iteration, output prompt_a_final.
Final Report expanded to 6 sections:
Section 4 changed to prompt_a_candidate (not final yet)
Added Section 5 Iteration Validation (baseline vs candidate + gate results)
Added Section 6 Final Deliverable Prompt (prompt_a_final + traceability table)
Validation gates added: require P0-related score=1 to be zero, core TP avg improvement, pass-rate lift, and no safety regression (when safety TP exists).
New output artifacts:
prompt-eval-results/prompt_change_spec.csv
prompt-eval-results/prompt_iteration_summary.csv
prompt-eval-results/prompt_a_final.txt
Metadata updated: frontmatter description now reflects optimization loop + final validated prompt output, and was shortened to satisfy validator limits.
Validation status: skill passes quick_validate.py after installing PyYAML in an isolated virtual environment.
v1.0.0
**prompt-eval v1.0.0 Changelog**
- Initial release of the skill.
- Automatically evaluates and scores any AI prompt using a structured 5-step pipeline.
- Covers both quantitative (format, logic, rules) and qualitative (engagement, persuasiveness, appeal) evaluation.
- Mandatory safety evaluation included for every run.
- Runs 200+ test cases per evaluation; outputs results as CSV files plus a final report with actionable insights.
- Designed for prompt benchmarking, quality measurement, test case generation, and automated prompt QA.
元数据
常见问题
prompt-eval 是什么?
Evaluate and optimize any AI prompt (`prompt_a`) with a 6-step pipeline: test plan, ~50 test cases, prompt execution, evaluator prompt (`prompt_b`), automate... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 203 次。
如何安装 prompt-eval?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install prompt-eval」即可一键安装,无需额外配置。
prompt-eval 是免费的吗?
是的,prompt-eval 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
prompt-eval 支持哪些平台?
prompt-eval 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 prompt-eval?
由 Rivin-Dong(@rivin-dong)开发并维护,当前版本 v1.0.1。
推荐 Skills