LLM Evaluator Pro
/install llm-evaluator-pro
LLM Evaluator ⚖️
LLM-as-a-Judge evaluation system powered by Langfuse. Uses GPT-5-nano to score AI outputs.
When to Use
- Evaluating quality of search results or AI responses
- Scoring traces for relevance, accuracy, hallucination detection
- Batch scoring recent unscored traces
- Quality assurance on agent outputs
Usage
# Test with sample cases
python3 {baseDir}/scripts/evaluator.py test
# Score a specific Langfuse trace
python3 {baseDir}/scripts/evaluator.py score \x3Ctrace_id>
# Score with specific evaluator only
python3 {baseDir}/scripts/evaluator.py score \x3Ctrace_id> --evaluators relevance
# Backfill scores on recent unscored traces
python3 {baseDir}/scripts/evaluator.py backfill --limit 20
Evaluators
| Evaluator | Measures | Scale |
|---|---|---|
| relevance | Response relevance to query | 0–1 |
| accuracy | Factual correctness | 0–1 |
| hallucination | Made-up information detection | 0–1 |
| helpfulness | Overall usefulness | 0–1 |
Credits
Built by M. Abidi | agxntsix.ai YouTube | GitHub Part of the AgxntSix Skill Suite for OpenClaw agents.
📅 Need help setting up OpenClaw for your business? Book a free consultation
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install llm-evaluator-pro - After installation, invoke the skill by name or use
/llm-evaluator-pro - Provide required inputs per the skill's parameter spec and get structured output
What is LLM Evaluator Pro?
LLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace... It is an AI Agent Skill for Claude Code / OpenClaw, with 739 downloads so far.
How do I install LLM Evaluator Pro?
Run "/install llm-evaluator-pro" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is LLM Evaluator Pro free?
Yes, LLM Evaluator Pro is completely free (open-source). You can download, install and use it at no cost.
Which platforms does LLM Evaluator Pro support?
LLM Evaluator Pro is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created LLM Evaluator Pro?
It is built and maintained by aiwithabidi (@aiwithabidi); the current version is v1.0.0.