Llm Evaluator
/install llm-evaluator
LLM Evaluator ⚖️
LLM-as-a-Judge evaluation system powered by Langfuse. Uses GPT-5-nano to score AI outputs.
When to Use
- Evaluating quality of search results or AI responses
- Scoring traces for relevance, accuracy, hallucination detection
- Batch scoring recent unscored traces
- Quality assurance on agent outputs
Usage
# Test with sample cases
python3 {baseDir}/scripts/evaluator.py test
# Score a specific Langfuse trace
python3 {baseDir}/scripts/evaluator.py score \x3Ctrace_id>
# Score with specific evaluator only
python3 {baseDir}/scripts/evaluator.py score \x3Ctrace_id> --evaluators relevance
# Backfill scores on recent unscored traces
python3 {baseDir}/scripts/evaluator.py backfill --limit 20
Evaluators
| Evaluator | Measures | Scale |
|---|---|---|
| relevance | Response relevance to query | 0–1 |
| accuracy | Factual correctness | 0–1 |
| hallucination | Made-up information detection | 0–1 |
| helpfulness | Overall usefulness | 0–1 |
Credits
Built by M. Abidi | agxntsix.ai YouTube | GitHub Part of the AgxntSix Skill Suite for OpenClaw agents.
📅 Need help setting up OpenClaw for your business? Book a free consultation
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install llm-evaluator - After installation, invoke the skill by name or use
/llm-evaluator - Provide required inputs per the skill's parameter spec and get structured output
What is Llm Evaluator?
LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical trac... It is an AI Agent Skill for Claude Code / OpenClaw, with 375 downloads so far.
How do I install Llm Evaluator?
Run "/install llm-evaluator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Llm Evaluator free?
Yes, Llm Evaluator is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Llm Evaluator support?
Llm Evaluator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Llm Evaluator?
It is built and maintained by aiwithabidi (@aiwithabidi); the current version is v1.0.0.