← Back to Skills Marketplace
rag-eval
by
Jonathan Jing
· GitHub ↗
· v1.2.1
700
Downloads
2
Stars
0
Active Installs
6
Versions
Install in OpenClaw
/install rag-eval
Description
Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision).
Usage Guidance
This skill appears to do what it claims, but take these precautions before installing or running it: 1) Inspect the included scripts locally (scripts/run_eval.py, scripts/batch_eval.py, scripts/setup.sh) — don't run arbitrary shell scripts without review. 2) Use a Python virtual environment (python -m venv .venv; source .venv/bin/activate) before running setup.sh to avoid global pip installs. 3) Protect your LLM keys — the skill uses OPENAI_API_KEY/ANTHROPIC_API_KEY to call remote LLMs; grant least-privilege keys where possible and monitor usage. 4) The tool writes evaluation files to memory/eval-results in the working directory; verify this location suits your data-retention policies. 5) There is a truncated/possibly buggy section in the provided run_eval.py excerpt (the sample here was truncated) — ensure you have the complete, reviewed script before running explain/advanced features. 6) Expect runtime costs for LLM judge calls. If you need higher assurance, ask for a line-by-line code review or a reproducible test run in an isolated environment.
Capability Analysis
Type: OpenClaw Skill
Name: rag-eval
Version: 1.2.1
The skill is classified as suspicious due to a potential prompt injection vulnerability against the LLM judge in `scripts/run_eval.py`. The `explain_faithfulness` function constructs a prompt using user-controlled `answer` and `contexts`, which could allow an attacker to manipulate the LLM judge's behavior or extract information from it. While the skill developers have demonstrated good security practices by explicitly fixing a previous shell injection vulnerability (noted in `CHANGELOG.md`) and implementing secure input handling (JSON parsing instead of shell interpolation, as instructed in `SKILL.md`), the LLM interaction pattern remains a vulnerability inherent to its design.
Capability Assessment
Purpose & Capability
Name/description (RAG evaluation with Ragas) aligns with code and instructions. Declared required binaries (python3, pip), optional env vars (OPENAI/ANTHROPIC/RAGAS_LLM), and the included scripts (run_eval.py, batch_eval.py, setup.sh) are all appropriate for performing LLM-judged RAG evals.
Instruction Scope
SKILL.md instructs the agent to accept question/answer/contexts, write a temp JSON file, and call the provided Python scripts; it explicitly warns against shell-injecting user content. The scripts only reference expected files/paths (memory/eval-results) and expected env vars. No instructions request unrelated system data or unrelated credentials.
Install Mechanism
No registry install spec is provided; the included scripts/setup.sh installs dependencies via pip from public PyPI packages (ragas, datasets, langchain integrations). This is expected for a Python tool but may modify system Python if a virtualenv isn't used. No downloads from untrusted URLs or URL-shortened installers were found.
Credentials
Requested environment access is limited to LLM-related keys and optional RAGAS_* tuning variables. These are justified by the skill's need to call an LLM judge and (optionally) embeddings. No unrelated secrets or multiple unrelated service credentials are requested.
Persistence & Privilege
The skill does not request always:true and does not modify other skills. It persists evaluation outputs under memory/eval-results (expected for a reporting tool). The setup script may install packages on the host environment but does not request elevated system privileges.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install rag-eval - After installation, invoke the skill by name or use
/rag-eval - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.2.1
Fixed versioning regression and added simplified installation instructions.
v0.1.1
Added simplified installation instructions to SKILL.md and README.md.
v1.2.0
SECURITY FIX: Shell injection vulnerability in SKILL.md instructions. Agent was instructed to echo user content directly into shell — replaced with safe file-based input pattern. Added --input-file flag to run_eval.py. Updated examples to recommend file input over stdin piping.
v1.1.1
Fix: setup.sh warns about global pip install and recommends virtualenv. Clean up PRD (remove unimplemented Discord/Notion claims). No code bugs found in explain_faithfulness (response var is consistent).
v1.1.0
Fix: declare LLM API key requirement in skill metadata (anyEnv: OPENAI_API_KEY | ANTHROPIC_API_KEY | RAGAS_LLM). Document all optional RAGAS_* env vars. Clearer prerequisites in SKILL.md and README.
v1.0.0
Initial release: Ragas-based RAG pipeline quality testing (faithfulness, relevancy, context precision). Single + batch eval modes.
Metadata
Frequently Asked Questions
What is rag-eval?
Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision). It is an AI Agent Skill for Claude Code / OpenClaw, with 700 downloads so far.
How do I install rag-eval?
Run "/install rag-eval" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is rag-eval free?
Yes, rag-eval is completely free (open-source). You can download, install and use it at no cost.
Which platforms does rag-eval support?
rag-eval is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created rag-eval?
It is built and maintained by Jonathan Jing (@jonathanjing); the current version is v1.2.1.
More Skills