功能描述

Intelligent toolkit for annotating images, text, audio, and video with active learning, quality control, and exporting labeled datasets.

使用说明 (SKILL.md)

Data Labeling Studio

Name: Data Labeling Studio
Author: kaiyuelv

Metadata

Name: data-labeling-studio
Display Name: Data Labeling Studio | 数据标注工作室
Description:
- EN: Intelligent data labeling and annotation toolkit supporting image, text, audio, and video with active learning and quality control.
- ZH: 智能数据标注和注释工具包，支持图像、文本、音频和视频，包含主动学习和质量控制。
Version: 1.0.0
Author: Kimi Claw
Tags: data-labeling, annotation, image-annotation, text-annotation, active-learning, quality-control, dataset, ml-training
Category: Data Processing
Icon: 🏷️

Capabilities

Actions

image_annotate

Perform image annotation

image_dir: Image directory path (string, required)
annotation_type: Type of annotation (string, required) - bounding_box, polygon, keypoint, segmentation
labels: Label categories (array, required)
output_format: Output format (string) - coco, pascal_voc, yolo
active_learning: Enable active learning suggestions (boolean, default: true)

text_annotate

Perform text annotation

text_data: Text data source (string/object, required)
annotation_task: Task type (string, required) - classification, ner, sentiment, summarization
labels: Label categories (array, required)
output_format: Output format (string) - json, csv, spacy

audio_annotate

Perform audio annotation

audio_dir: Audio directory path (string, required)
annotation_type: Type (string, required) - transcription, speaker_id, emotion, event
segment_duration: Segment duration in seconds (float, default: 5.0)

video_annotate

Perform video annotation

video_path: Video file path (string, required)
annotation_type: Type (string, required) - object_tracking, action_recognition, scene_detection
frame_sample_rate: Frame sampling rate (int, default: 1)

quality_check

Check annotation quality and consistency

annotations: Annotation file path (string, required)
ground_truth: Ground truth file path (string, optional)
metrics: Quality metrics (array) - iou, accuracy, consistency, coverage

dataset_export

Export labeled dataset to ML format

annotations: Annotation source (string, required)
format: Target format (string, required) - coco, yolo, tfrecord, huggingface
output_dir: Output directory (string, required)
split_ratios: Train/val/test split (object) - {train: 0.8, val: 0.1, test: 0.1}

Requirements

Python 3.8+
Pillow >= 10.0.0 (for image processing)
OpenCV >= 4.8.0 (for image/video annotation)
NumPy >= 1.24.0
Pandas >= 2.0.0
LabelImg >= 1.8.0 (optional)
Librosa >= 0.10.0 (for audio processing)
scikit-learn >= 1.3.0 (for active learning)

Examples

Image Annotation

from labeling_studio import ImageAnnotator

# Initialize annotator
annotator = ImageAnnotator(
    annotation_type="bounding_box",
    labels=["person", "car", "dog", "cat"],
    output_format="coco"
)

# Annotate images with active learning
annotator.annotate(
    image_dir="./images",
    output_file="./annotations/coco.json",
    active_learning=True  # AI suggests uncertain samples
)

# Export to YOLO format
annotator.export("./annotations", format="yolo")

Text Annotation

from labeling_studio import TextAnnotator

# NER annotation
annotator = TextAnnotator(
    annotation_task="ner",
    labels=["PERSON", "ORG", "LOC", "DATE"]
)

# Annotate from file
annotations = annotator.annotate(
    text_data="./data/corpus.txt",
    output_file="./annotations/ner.json"
)

Quality Check

from labeling_studio import QualityChecker

# Check annotation quality
checker = QualityChecker()
report = checker.check(
    annotations="./annotations/coco.json",
    ground_truth="./annotations/ground_truth.json",
    metrics=["iou", "consistency", "coverage"]
)

print(f"Average IoU: {report['iou']:.2f}")
print(f"Consistency Score: {report['consistency']:.2f}")
print(f"Coverage: {report['coverage']:.2f}")

Scripts

scripts/annotate_images.py: 图像标注工具
scripts/annotate_text.py: 文本标注工具
scripts/annotate_audio.py: 音频标注工具
scripts/annotate_video.py: 视频标注工具
scripts/quality_check.py: 质量检查工具
scripts/export_dataset.py: 数据集导出工具

Installation

pip install -r requirements.txt

Usage

# Image annotation with active learning
python scripts/annotate_images.py --input ./images --type bbox --labels person,car --format coco

# Text NER annotation
python scripts/annotate_text.py --input ./texts.txt --task ner --labels PERSON,ORG,LOC

# Quality check
python scripts/quality_check.py --annotations ./coco.json --ground-truth ./gt.json

# Export to YOLO
python scripts/export_dataset.py --input ./coco.json --format yolo --output ./yolo_dataset

License

MIT License

安全使用建议

This package looks internally inconsistent rather than blatantly malicious: it promises a full multi‑modal 'labeling_studio' with many helper scripts and model integrations, but the archive only contains an image annotator script, a quality checker, example/test mocks, and a requirements.txt. Before installing or running anything: - Don't pip install the requirements into your main environment. Use a disposable virtualenv or container to avoid pulling heavy packages unnecessarily. - Inspect or run the included scripts locally to confirm behavior. The image annotator uses mocked/simulated annotations (random), not real models; 'active learning' appears not implemented here. - Be cautious that examples import a module (labeling_studio) that isn't included — this may mean the published bundle is incomplete or the real implementation is fetched from elsewhere (ask the author or source). If the package intended to download or fetch code at runtime, that would be higher risk — but no such downloader is present in the files. - If you need multi‑modal capabilities, request the missing source files or a packaged release (e.g., on GitHub) and verify the code that integrates models or remote endpoints. If you don't get clear answers, prefer an alternative with a complete source/release. Overall: don't run or install this in a production environment until the mismatches are resolved; treat it as incomplete/misleading and proceed in a sandbox if you want to experiment.

功能分析

Type: OpenClaw Skill Name: data-labeling-studio Version: 1.0.0 The data-labeling-studio skill bundle is a legitimate toolkit for data annotation tasks across multiple modalities. The provided Python scripts (scripts/annotate_images.py, scripts/quality_check.py) and documentation (SKILL.md, README.md) contain standard data processing logic, such as image scanning, IoU calculation, and JSON-based annotation management, without any evidence of malicious intent, data exfiltration, or unauthorized execution.

能力标签

crypto

能力评估

⚠ Purpose & Capability

The skill claims multi‑modal support (image, text, audio, video) and an importable package 'labeling_studio', but the bundle only includes scripts for image annotation and quality checks. Several scripts referenced in SKILL.md (annotate_text.py, annotate_audio.py, annotate_video.py, export_dataset.py) and the labeling_studio module used in examples are not present. Declared requirements (librosa, OpenCV, Pillow, scikit‑learn) are heavier than what the included scripts actually use.

⚠ Instruction Scope

SKILL.md instructs running scripts and doing pip install -r requirements.txt which is expected, but many example commands and APIs reference missing files/modules (labeling_studio import, scripts that aren't in the manifest). The runtime instructions also enable 'active learning' and 'pre_annotate' but the included code only contains mock/simulated behavior rather than actual model integration — this is scope creep / mismatch between promised capabilities and real instructions.

ℹ Install Mechanism

There is no formal install spec (instruction-only), which is low risk. However SKILL.md and README suggest running 'pip install -r requirements.txt' which will pull several heavy third‑party packages; because the project is incomplete, installing those deps may be unnecessary and should be done in an isolated environment if attempted.

✓ Credentials

The skill requests no environment variables, no credentials, and no config paths. The code reads only local file paths supplied by the user. There is no evidence of attempts to access unrelated secrets or network endpoints in the provided files.

✓ Persistence & Privilege

The skill is not always-enabled and does not request persistent system privileges or modify other skills. It does not include an installer that writes to system locations; it is run on demand as scripts.

版本历史

v1.0.0

Initial release of Data Labeling Studio. - Supports intelligent data labeling and annotation for images, text, audio, and video - Includes active learning suggestions and quality control checks - Multiple annotation formats supported: COCO, YOLO, Pascal VOC, TFRecord, HuggingFace, and more - Tools provided for annotation, quality checking, and dataset export - Example usage and script files included for all major features

元数据

Slug data-labeling-studio

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题