/install dataset-evaluation
SKILL.md --- dataset_evaluation
Skill Name
dataset_evaluation
Description
Evaluate a miner submission by performing two evaluation steps:
- Content Consistency Evaluation
- Structured Data Quality Evaluation
The evaluator receives 5 cleaned data samples, the structured JSON, and the dataset schema, then computes a final score for the miner.
Input
{
"cleaned_data_list": [
"cleaned_text_1",
"cleaned_text_2",
"cleaned_text_3",
"cleaned_text_4",
"cleaned_text_5"
],
"structured_data": {
"field1": "value",
"field2": "value"
},
"dataset_schema": {
"fields": [
{"name": "title", "type": "string", "required": true},
{"name": "author", "type": "string", "required": false},
{"name": "date", "type": "string", "required": false},
{"name": "url", "type": "string", "required": true}
]
}
}
Evaluation Procedure
Step 1 --- Content Consistency Evaluation (Weight 40%)
Goal: determine whether the 5 cleaned texts represent the same underlying content.
Method
- Normalize text
- remove HTML
- lowercase
- remove excessive whitespace
- Compute pairwise similarity across the 5 texts
Recommended metrics:
- cosine similarity (embedding based)
- OR Jaccard similarity
- Compute the average similarity score.
Output
content_consistency_score (0-100)
Suggested mapping:
avg_similarity >= 0.9 → 100
0.8 – 0.9 → 80 – 100
0.6 – 0.8 → 60 – 80
0.4 – 0.6 → 40 – 60
\x3C 0.4 → \x3C 40
Step 2 --- Structured Data Quality Evaluation (Weight 60%)
Using the verified cleaned content, evaluate the structured JSON.
Compute four sub-scores.
2.1 Field Completeness (30%)
Evaluate whether all required fields exist.
Formula:
completeness_score =
(# required fields present / total required fields) * 100
2.2 Value Accuracy (40%)
Evaluate whether each field value is consistent with the cleaned data.
Examples:
- title appears in cleaned text
- author name appears in text
- url matches source
Scoring guideline:
exact match → 100
partially correct → 60-80
inconsistent → \x3C50
2.3 Type Correctness (15%)
Evaluate whether values match schema types.
Examples:
string
number
boolean
array
Formula:
type_score =
(# correct types / total fields) * 100
2.4 Information Sufficiency (15%)
Evaluate whether the structured data misses obvious information present in the cleaned text.
Example:
Cleaned text contains:
title
author
date
But structured JSON only includes:
title
Then deduct score.
Guideline:
complete extraction → 100
minor missing info → 70–90
major missing info → \x3C60
Structuring Quality Score
structuring_quality_score =
completeness_score * 0.30
+ value_accuracy_score * 0.40
+ type_score * 0.15
+ information_sufficiency_score * 0.15
Range:
0 – 100
Step 3 --- Final Miner Score
miner_score =
content_consistency_score * 0.4
+ structuring_quality_score * 0.6
Range:
0 – 100
Output Format
The evaluator must return:
{
"content_consistency_score": 92,
"structuring_quality_score": 85,
"miner_score": 88.2,
"details": {
"completeness_score": 90,
"value_accuracy_score": 88,
"type_score": 100,
"information_sufficiency_score": 80
}
}
Evaluator Rules
The evaluator must follow these principles:
- Be deterministic and reproducible
- Base judgments only on provided inputs
- Avoid hallucination
- Penalize missing or inconsistent data
- Return scores strictly in the 0--100 range
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install dataset-evaluation - 安装完成后,直接呼叫该 Skill 的名称或使用
/dataset-evaluation触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Dataset Evaluation 是什么?
Evaluate a submission by scoring content consistency of texts and quality of structured data based on completeness, accuracy, type correctness, and informati... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 243 次。
如何安装 Dataset Evaluation?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install dataset-evaluation」即可一键安装,无需额外配置。
Dataset Evaluation 是免费的吗?
是的,Dataset Evaluation 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Dataset Evaluation 支持哪些平台?
Dataset Evaluation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Dataset Evaluation?
由 levey(@levey)开发并维护,当前版本 v1.0.0。