← 返回 Skills 市场
aiwithabidi

LLM Evaluator Pro

作者 aiwithabidi · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
739
总下载
1
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install llm-evaluator-pro
功能描述
LLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace...
使用说明 (SKILL.md)

LLM Evaluator ⚖️

LLM-as-a-Judge evaluation system powered by Langfuse. Uses GPT-5-nano to score AI outputs.

When to Use

  • Evaluating quality of search results or AI responses
  • Scoring traces for relevance, accuracy, hallucination detection
  • Batch scoring recent unscored traces
  • Quality assurance on agent outputs

Usage

# Test with sample cases
python3 {baseDir}/scripts/evaluator.py test

# Score a specific Langfuse trace
python3 {baseDir}/scripts/evaluator.py score \x3Ctrace_id>

# Score with specific evaluator only
python3 {baseDir}/scripts/evaluator.py score \x3Ctrace_id> --evaluators relevance

# Backfill scores on recent unscored traces
python3 {baseDir}/scripts/evaluator.py backfill --limit 20

Evaluators

Evaluator Measures Scale
relevance Response relevance to query 0–1
accuracy Factual correctness 0–1
hallucination Made-up information detection 0–1
helpfulness Overall usefulness 0–1

Credits

Built by M. Abidi | agxntsix.ai YouTube | GitHub Part of the AgxntSix Skill Suite for OpenClaw agents.

📅 Need help setting up OpenClaw for your business? Book a free consultation

安全使用建议
This skill largely does what its README says, but there are several red flags you should resolve before running it in a production environment: 1) The script contains hardcoded Langfuse API keys and a hardcoded Langfuse host and uses those values directly — that could send your trace data (or allow the script to act using somebody else's account). Treat those embedded keys as suspicious and do not rely on them. 2) The script will attempt to read ~/.openclaw/workspace/.env for an OPENROUTER_API_KEY if you don't set one in the environment; that file may contain unrelated secrets. The skill metadata did not declare that config path. 3) Dependencies (requests, openai, langfuse) are not declared; running without knowing what will be installed is fragile. Recommended actions before installing/using: - Inspect the evaluator.py file fully (remove or rotate any embedded keys). - Replace hardcoded LF_AUTH/LF_API with explicit env-based configuration and ensure the host points to a Langfuse instance you control. - Avoid running the script as-is on systems with sensitive ~/.openclaw/workspace/.env files; run it in an isolated test environment or container first. - If you need to trust this skill, ask the publisher to provide a version that reads credentials only from declared env vars (no defaults), documents required Python packages, and documents exactly which endpoints will receive data. If the publisher confirms the embedded keys are inert placeholders and the code is changed to respect environment values only, the concerns would be reduced.
功能分析
Type: OpenClaw Skill Name: llm-evaluator-pro Version: 1.0.0 The skill is classified as suspicious due to two key vulnerabilities found in `scripts/evaluator.py`. Firstly, it attempts to read the `OPENROUTER_API_KEY` from `~/.openclaw/workspace/.env` if not found in environment variables, which is a local file inclusion/information disclosure risk, granting the skill access to a potentially sensitive file within the OpenClaw workspace. Secondly, it contains hardcoded Langfuse API keys (`sk-lf-115cb6b4-7153-4fe6-9255-bf28f8b115de`, `pk-lf-8a9322b9-5eb1-4e8b-815e-b3428dc69bc4`) and an internal host (`http://langfuse-web:3000`), which, while potentially overridden by environment variables, represents poor security practice and a potential credential leak if these keys were sensitive and used in an unintended context.
能力评估
Purpose & Capability
Name/description match the code: it uses OpenRouter (GPT judge) and Langfuse to score traces. Requesting OPENROUTER_API_KEY and Langfuse keys is consistent with the described function. However the code contains hardcoded Langfuse keys and host values, which undermines the declared requirement model (the skill claims to require env vars but will fall back to embedded credentials).
Instruction Scope
SKILL.md instructs running the included Python script. The script, however, attempts to read ~/.openclaw/workspace/.env for the OpenRouter key (a config path not declared in metadata) and uses hardcoded Langfuse credentials/host to call the Langfuse API. Reading an undeclared workspace .env can access other secrets; always-posting scores to a hardcoded Langfuse endpoint (with embedded keys) could transmit data to an unexpected/third-party account.
Install Mechanism
There is no install spec. The skill includes a Python script but does not declare Python package dependencies (requests, openai, langfuse). That is a coherence/usability issue (script may fail). Lack of an install step lowers installation auditability, but is not itself malicious — still increases risk because it's unclear what packages will be installed by users to run it.
Credentials
Declared env vars (OPENROUTER_API_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY) are appropriate for the stated purpose. However the script: (1) sets default LANGFUSE keys in code, (2) hardcodes LF_AUTH and LF_API values rather than reading the environment, and (3) attempts to parse ~/.openclaw/workspace/.env if OPENROUTER_API_KEY is not set. These behaviors mean the skill can use embedded credentials and read an undeclared local .env file, which is disproportionate and suspicious.
Persistence & Privilege
The skill is not force-included (always=false) and does not request persistent platform privileges. It does not attempt to modify other skills or global agent configuration. Autonomy is enabled by default but is not an additional red flag here.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install llm-evaluator-pro
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /llm-evaluator-pro 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
LLM-as-a-Judge evaluator via Langfuse
元数据
Slug llm-evaluator-pro
版本 1.0.0
许可证
累计安装 1
当前安装数 1
历史版本数 1
常见问题

LLM Evaluator Pro 是什么?

LLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 739 次。

如何安装 LLM Evaluator Pro?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install llm-evaluator-pro」即可一键安装,无需额外配置。

LLM Evaluator Pro 是免费的吗?

是的,LLM Evaluator Pro 完全免费(开源免费),可自由下载、安装和使用。

LLM Evaluator Pro 支持哪些平台?

LLM Evaluator Pro 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 LLM Evaluator Pro?

由 aiwithabidi(@aiwithabidi)开发并维护,当前版本 v1.0.0。

💬 留言讨论