← 返回 Skills 市场

rag-eval

Name: rag-eval
Author: jonathanjing

作者 Jonathan Jing · GitHub ↗ · v1.2.1

cross-platform ⚠ suspicious

700

总下载

当前安装

版本数

在 OpenClaw 中安装

/install rag-eval

功能描述

Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision).

安全使用建议

This skill appears to do what it claims, but take these precautions before installing or running it: 1) Inspect the included scripts locally (scripts/run_eval.py, scripts/batch_eval.py, scripts/setup.sh) — don't run arbitrary shell scripts without review. 2) Use a Python virtual environment (python -m venv .venv; source .venv/bin/activate) before running setup.sh to avoid global pip installs. 3) Protect your LLM keys — the skill uses OPENAI_API_KEY/ANTHROPIC_API_KEY to call remote LLMs; grant least-privilege keys where possible and monitor usage. 4) The tool writes evaluation files to memory/eval-results in the working directory; verify this location suits your data-retention policies. 5) There is a truncated/possibly buggy section in the provided run_eval.py excerpt (the sample here was truncated) — ensure you have the complete, reviewed script before running explain/advanced features. 6) Expect runtime costs for LLM judge calls. If you need higher assurance, ask for a line-by-line code review or a reproducible test run in an isolated environment.

功能分析

Type: OpenClaw Skill Name: rag-eval Version: 1.2.1 The skill is classified as suspicious due to a potential prompt injection vulnerability against the LLM judge in `scripts/run_eval.py`. The `explain_faithfulness` function constructs a prompt using user-controlled `answer` and `contexts`, which could allow an attacker to manipulate the LLM judge's behavior or extract information from it. While the skill developers have demonstrated good security practices by explicitly fixing a previous shell injection vulnerability (noted in `CHANGELOG.md`) and implementing secure input handling (JSON parsing instead of shell interpolation, as instructed in `SKILL.md`), the LLM interaction pattern remains a vulnerability inherent to its design.

能力评估

✓ Purpose & Capability

Name/description (RAG evaluation with Ragas) aligns with code and instructions. Declared required binaries (python3, pip), optional env vars (OPENAI/ANTHROPIC/RAGAS_LLM), and the included scripts (run_eval.py, batch_eval.py, setup.sh) are all appropriate for performing LLM-judged RAG evals.

✓ Instruction Scope

SKILL.md instructs the agent to accept question/answer/contexts, write a temp JSON file, and call the provided Python scripts; it explicitly warns against shell-injecting user content. The scripts only reference expected files/paths (memory/eval-results) and expected env vars. No instructions request unrelated system data or unrelated credentials.

ℹ Install Mechanism

No registry install spec is provided; the included scripts/setup.sh installs dependencies via pip from public PyPI packages (ragas, datasets, langchain integrations). This is expected for a Python tool but may modify system Python if a virtualenv isn't used. No downloads from untrusted URLs or URL-shortened installers were found.

✓ Credentials

Requested environment access is limited to LLM-related keys and optional RAGAS_* tuning variables. These are justified by the skill's need to call an LLM judge and (optionally) embeddings. No unrelated secrets or multiple unrelated service credentials are requested.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills. It persists evaluation outputs under memory/eval-results (expected for a reporting tool). The setup script may install packages on the host environment but does not request elevated system privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install rag-eval
安装完成后，直接呼叫该 Skill 的名称或使用 /rag-eval 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.2.1

Fixed versioning regression and added simplified installation instructions.

v0.1.1

Added simplified installation instructions to SKILL.md and README.md.

v1.2.0

SECURITY FIX: Shell injection vulnerability in SKILL.md instructions. Agent was instructed to echo user content directly into shell — replaced with safe file-based input pattern. Added --input-file flag to run_eval.py. Updated examples to recommend file input over stdin piping.

v1.1.1

Fix: setup.sh warns about global pip install and recommends virtualenv. Clean up PRD (remove unimplemented Discord/Notion claims). No code bugs found in explain_faithfulness (response var is consistent).

v1.1.0

Fix: declare LLM API key requirement in skill metadata (anyEnv: OPENAI_API_KEY | ANTHROPIC_API_KEY | RAGAS_LLM). Document all optional RAGAS_* env vars. Clearer prerequisites in SKILL.md and README.

v1.0.0

Initial release: Ragas-based RAG pipeline quality testing (faithfulness, relevancy, context precision). Single + batch eval modes.

元数据

Slug rag-eval

版本 1.2.1

许可证 —

累计安装 0

当前安装数 0

历史版本数 6

常见问题

rag-eval 是什么？

Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 700 次。

如何安装 rag-eval？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install rag-eval」即可一键安装，无需额外配置。

rag-eval 是免费的吗？

是的，rag-eval 完全免费（开源免费），可自由下载、安装和使用。

rag-eval 支持哪些平台？

rag-eval 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 rag-eval？

由 Jonathan Jing（@jonathanjing）开发并维护，当前版本 v1.2.1。