← 返回 Skills 市场
levey

Dataset Evaluation

作者 levey · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
243
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install dataset-evaluation
功能描述
Evaluate a submission by scoring content consistency of texts and quality of structured data based on completeness, accuracy, type correctness, and informati...
使用说明 (SKILL.md)

SKILL.md --- dataset_evaluation

Skill Name

dataset_evaluation

Description

Evaluate a miner submission by performing two evaluation steps:

  1. Content Consistency Evaluation
  2. Structured Data Quality Evaluation

The evaluator receives 5 cleaned data samples, the structured JSON, and the dataset schema, then computes a final score for the miner.


Input

{
  "cleaned_data_list": [
    "cleaned_text_1",
    "cleaned_text_2",
    "cleaned_text_3",
    "cleaned_text_4",
    "cleaned_text_5"
  ],
  "structured_data": {
    "field1": "value",
    "field2": "value"
  },
  "dataset_schema": {
    "fields": [
      {"name": "title", "type": "string", "required": true},
      {"name": "author", "type": "string", "required": false},
      {"name": "date", "type": "string", "required": false},
      {"name": "url", "type": "string", "required": true}
    ]
  }
}

Evaluation Procedure

Step 1 --- Content Consistency Evaluation (Weight 40%)

Goal: determine whether the 5 cleaned texts represent the same underlying content.

Method

  1. Normalize text
  • remove HTML
  • lowercase
  • remove excessive whitespace
  1. Compute pairwise similarity across the 5 texts

Recommended metrics:

  • cosine similarity (embedding based)
  • OR Jaccard similarity
  1. Compute the average similarity score.

Output

content_consistency_score (0-100)

Suggested mapping:

avg_similarity >= 0.9 → 100
0.8 – 0.9 → 80 – 100
0.6 – 0.8 → 60 – 80
0.4 – 0.6 → 40 – 60
\x3C 0.4 → \x3C 40

Step 2 --- Structured Data Quality Evaluation (Weight 60%)

Using the verified cleaned content, evaluate the structured JSON.

Compute four sub-scores.


2.1 Field Completeness (30%)

Evaluate whether all required fields exist.

Formula:

completeness_score =
    (# required fields present / total required fields) * 100

2.2 Value Accuracy (40%)

Evaluate whether each field value is consistent with the cleaned data.

Examples:

  • title appears in cleaned text
  • author name appears in text
  • url matches source

Scoring guideline:

exact match → 100
partially correct → 60-80
inconsistent → \x3C50

2.3 Type Correctness (15%)

Evaluate whether values match schema types.

Examples:

string
number
boolean
array

Formula:

type_score =
    (# correct types / total fields) * 100

2.4 Information Sufficiency (15%)

Evaluate whether the structured data misses obvious information present in the cleaned text.

Example:

Cleaned text contains:

title
author
date

But structured JSON only includes:

title

Then deduct score.

Guideline:

complete extraction → 100
minor missing info → 70–90
major missing info → \x3C60

Structuring Quality Score

structuring_quality_score =
    completeness_score * 0.30
  + value_accuracy_score * 0.40
  + type_score * 0.15
  + information_sufficiency_score * 0.15

Range:

0 – 100

Step 3 --- Final Miner Score

miner_score =
    content_consistency_score * 0.4
  + structuring_quality_score * 0.6

Range:

0 – 100

Output Format

The evaluator must return:

{
  "content_consistency_score": 92,
  "structuring_quality_score": 85,
  "miner_score": 88.2,
  "details": {
    "completeness_score": 90,
    "value_accuracy_score": 88,
    "type_score": 100,
    "information_sufficiency_score": 80
  }
}

Evaluator Rules

The evaluator must follow these principles:

  1. Be deterministic and reproducible
  2. Base judgments only on provided inputs
  3. Avoid hallucination
  4. Penalize missing or inconsistent data
  5. Return scores strictly in the 0--100 range
安全使用建议
This skill is internally coherent and low-risk in terms of installation and secrets. Consider these operational points before installing: - Determine how embeddings/similarity will be computed (which model/library/API) and whether that sends data to an external service — redact or avoid sensitive PII if you must send data out. - If you need strict reproducibility, specify deterministic embedding/model versions or a seed and document similarity thresholds (the SKILL.md gives ranges but not exact mappings). - Test the evaluator on representative examples to ensure the mapping from similarity to scores matches your expectations (edge cases like partial matches, paraphrases, or noisy text). - If you want to limit autonomous runs, note that always:false is set but the platform default allows autonomous invocation; change agent permissions if needed.
功能分析
Type: OpenClaw Skill Name: dataset-evaluation Version: 1.0.0 The skill bundle contains instructions for an AI agent to evaluate dataset submissions based on content consistency and structured data quality. The logic defined in SKILL.md is focused entirely on data processing, similarity scoring, and schema validation, with no evidence of malicious intent, data exfiltration, or unauthorized command execution.
能力评估
Purpose & Capability
Name/description (dataset evaluation: content consistency + structured-data quality) match the SKILL.md instructions and required inputs. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
Instructions confine work to the provided 5 cleaned texts, structured JSON, and schema. They recommend embedding-based cosine similarity or Jaccard and give clear scoring formulas. Note: the skill references embeddings/similarity algorithms but does not specify exact model, library, or deterministic settings; this can affect reproducibility and whether data must be sent to external model APIs.
Install Mechanism
No install spec and no code files (instruction-only). Nothing will be written to disk or downloaded as part of the skill itself.
Credentials
The skill declares no environment variables, credentials, or config paths. The requested inputs are precisely the data items needed for the stated evaluation.
Persistence & Privilege
always is false and there is no indication the skill requests persistent system privileges or modifies other skills/config. Autonomous invocation is allowed by default but is not elevated here.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install dataset-evaluation
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /dataset-evaluation 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of dataset evaluation skill. - Implements a two-step evaluation: Content Consistency and Structured Data Quality. - Calculates a weighted final miner score based on both content and structuring assessments. - Evaluates JSON structure for field completeness, value accuracy, type correctness, and information sufficiency. - Provides a standardized output with detailed sub-scores.
元数据
Slug dataset-evaluation
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 0
历史版本数 1
常见问题

Dataset Evaluation 是什么?

Evaluate a submission by scoring content consistency of texts and quality of structured data based on completeness, accuracy, type correctness, and informati... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 243 次。

如何安装 Dataset Evaluation?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dataset-evaluation」即可一键安装,无需额外配置。

Dataset Evaluation 是免费的吗?

是的,Dataset Evaluation 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Dataset Evaluation 支持哪些平台?

Dataset Evaluation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Dataset Evaluation?

由 levey(@levey)开发并维护,当前版本 v1.0.0。

💬 留言讨论