← 返回 Skills 市场

Clinical Data Cleaner

Name: Clinical Data Cleaner
Author: renhaosu2024

作者 renhaosu2024 · GitHub ↗ · v0.1.1 · MIT-0

cross-platform ✓ 安全检测通过

402

总下载

当前安装

版本数

在 OpenClaw 中安装

/install clinical-data-cleaner

功能描述

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec...

使用说明 (SKILL.md)

Clinical Data Cleaner

Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.

Quick Start

from scripts.main import ClinicalDataCleaner

# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')

# Clean data with default settings
cleaned = cleaner.clean(raw_data)

# Save with audit trail
cleaner.save_report('output.csv')

Core Capabilities

1. SDTM Domain Validation

cleaner = ClinicalDataCleaner(domain='DM')  # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)

Required Fields:

DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC

2. Missing Value Handling

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median'  # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)

3. Outlier Detection

cleaner = ClinicalDataCleaner(
    domain='LB',
    outlier_method='domain',  # iqr, zscore, domain
    outlier_action='flag'     # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)

Clinical Thresholds:

Parameter	Range	Unit
Glucose	50-500	mg/dL
Hemoglobin	5-20	g/dL
Systolic BP	70-220	mmHg

4. Date Standardization

standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00

5. Complete Pipeline

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median',
    outlier_method='iqr',
    outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')

Output Files:

output.csv - Cleaned SDTM data
output.report.json - Audit trail for regulatory submission

CLI Usage

# Clean demographics
python scripts/main.py \
  --input dm_raw.csv \
  --domain DM \
  --output dm_clean.csv \
  --missing-strategy median \
  --outlier-method iqr \
  --outlier-action flag

# Clean lab data with clinical thresholds
python scripts/main.py \
  --input lb_raw.csv \
  --domain LB \
  --output lb_clean.csv \
  --outlier-method domain

Common Patterns

See references/common-patterns.md for detailed examples:

Regulatory Submission Preparation
Interim Analysis Data Preparation
Database Migration Cleanup
External Lab Data Integration

Troubleshooting

See references/troubleshooting.md for solutions to:

Validation failures
Date parsing errors
Memory errors with large datasets
Outlier detection issues

Quality Checklist

Pre-Cleaning:

IACUC approval obtained (animal studies)
Sample size adequately powered
Randomization method documented

Post-Cleaning:

Validate against CDISC SDTM IG
Review all cleaning actions in audit trail
Test import to analysis software

References

references/sdtm_ig_guide.md - CDISC SDTM Implementation Guide
references/domain_specs.json - Domain-specific field requirements
references/outlier_thresholds.json - Clinical outlier thresholds
references/common-patterns.md - Detailed usage patterns
references/troubleshooting.md - Problem-solving guide

Skill ID: 189 | Version: 2.0 | License: MIT

安全使用建议

This skill appears internally consistent and implements the documented SDTM cleaning functionality. Before installing or running it: 1) Review scripts/main.py locally to confirm there are no hidden network calls or unexpected behavior (the code provided looks local-file oriented). 2) Run it only in a secure environment when processing regulated patient data (PHI); ensure raw data backups and do not overwrite originals. 3) Install dependencies in an isolated environment (virtualenv/container) per requirements.txt. 4) Inspect or provide any config files you pass via config_path to avoid accidentally exposing secrets or pointing at sensitive locations. 5) Note minor metadata mismatches (SKILL.md/version/tile.json) — verify you're using the intended release before relying on it for regulatory submissions. If you need higher assurance, ask the author for provenance, tests, and a signed release or run the tool on synthetic data to validate behavior first.

功能分析

Type: OpenClaw Skill Name: clinical-data-cleaner Version: 0.1.1 The clinical-data-cleaner skill is a legitimate tool designed to standardize clinical trial data into CDISC SDTM formats. The core logic in scripts/main.py uses standard data processing libraries (pandas, numpy) to perform validation, missing value imputation, and outlier detection based on clinical thresholds defined in the references/ directory. There is no evidence of data exfiltration, malicious command execution, or prompt injection; the requested permissions (Read, Write, Bash, Edit) are consistent with a tool intended to process and modify data files locally.

能力评估

✓ Purpose & Capability

Name/description (SDTM cleaning, outlier detection, imputation) match the provided code, JSON reference files, and CLI. Required dependencies (numpy, pandas, scipy) are appropriate for data processing. No unrelated credentials, binaries, or config paths are requested.

✓ Instruction Scope

SKILL.md directs the agent to load local files, run cleaning functions, and write output and audit trail files—actions consistent with a data-cleaning tool. The instructions do not ask the agent to read environment variables, network endpoints, or unrelated system paths. The skill will read arbitrary input files supplied by the user (expected behavior for this tool).

✓ Install Mechanism

There is no install spec (instruction-only skill plus included script). Dependencies are declared in requirements.txt and scripts/requirements.txt (numpy, pandas, scipy), which is proportionate for numeric/data tasks. No downloads from arbitrary URLs or archive extraction are present.

✓ Credentials

The skill does not declare or require environment variables, credentials, or remote tokens. It can optionally load a local JSON config file (config_path) — reading a local config file is reasonable but means the skill can access arbitrary files the user points it to.

✓ Persistence & Privilege

always is false and the skill does not request persistent platform privileges. It writes output and audit trail files as expected for a cleaner and does not modify other skills or system-wide settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install clinical-data-cleaner
安装完成后，直接呼叫该 Skill 的名称或使用 /clinical-data-cleaner 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.1

No changes detected in this version. - No file or documentation updates since the previous release. - Functionality and usage remain unchanged.

v0.1.0

Initial release of clinical-data-cleaner. - Cleans and standardizes clinical trial data for FDA/EMA submission using CDISC SDTM domains (DM, LB, VS) - Handles missing values with multiple strategies (mean, median, mode, forward, drop) - Detects and handles outliers using IQR, z-score, or clinical domain thresholds - Standardizes date formats to ISO 8601 - Generates audit trails for regulatory compliance - Includes CLI and Python API for data cleaning workflows

元数据

Slug clinical-data-cleaner

版本 0.1.1

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 2

常见问题