Clinical Data Cleaner
/install clinical-data-cleaner
Clinical Data Cleaner
Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.
Quick Start
from scripts.main import ClinicalDataCleaner
# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')
# Clean data with default settings
cleaned = cleaner.clean(raw_data)
# Save with audit trail
cleaner.save_report('output.csv')
Core Capabilities
1. SDTM Domain Validation
cleaner = ClinicalDataCleaner(domain='DM') # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)
Required Fields:
- DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
- LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
- VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC
2. Missing Value Handling
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median' # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)
3. Outlier Detection
cleaner = ClinicalDataCleaner(
domain='LB',
outlier_method='domain', # iqr, zscore, domain
outlier_action='flag' # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)
Clinical Thresholds:
| Parameter | Range | Unit |
|---|---|---|
| Glucose | 50-500 | mg/dL |
| Hemoglobin | 5-20 | g/dL |
| Systolic BP | 70-220 | mmHg |
4. Date Standardization
standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00
5. Complete Pipeline
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median',
outlier_method='iqr',
outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')
Output Files:
output.csv- Cleaned SDTM dataoutput.report.json- Audit trail for regulatory submission
CLI Usage
# Clean demographics
python scripts/main.py \
--input dm_raw.csv \
--domain DM \
--output dm_clean.csv \
--missing-strategy median \
--outlier-method iqr \
--outlier-action flag
# Clean lab data with clinical thresholds
python scripts/main.py \
--input lb_raw.csv \
--domain LB \
--output lb_clean.csv \
--outlier-method domain
Common Patterns
See references/common-patterns.md for detailed examples:
- Regulatory Submission Preparation
- Interim Analysis Data Preparation
- Database Migration Cleanup
- External Lab Data Integration
Troubleshooting
See references/troubleshooting.md for solutions to:
- Validation failures
- Date parsing errors
- Memory errors with large datasets
- Outlier detection issues
Quality Checklist
Pre-Cleaning:
- IACUC approval obtained (animal studies)
- Sample size adequately powered
- Randomization method documented
Post-Cleaning:
- Validate against CDISC SDTM IG
- Review all cleaning actions in audit trail
- Test import to analysis software
References
references/sdtm_ig_guide.md- CDISC SDTM Implementation Guidereferences/domain_specs.json- Domain-specific field requirementsreferences/outlier_thresholds.json- Clinical outlier thresholdsreferences/common-patterns.md- Detailed usage patternsreferences/troubleshooting.md- Problem-solving guide
Skill ID: 189 | Version: 2.0 | License: MIT
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install clinical-data-cleaner - 安装完成后,直接呼叫该 Skill 的名称或使用
/clinical-data-cleaner触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Clinical Data Cleaner 是什么?
Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 402 次。
如何安装 Clinical Data Cleaner?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install clinical-data-cleaner」即可一键安装,无需额外配置。
Clinical Data Cleaner 是免费的吗?
是的,Clinical Data Cleaner 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Clinical Data Cleaner 支持哪些平台?
Clinical Data Cleaner 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Clinical Data Cleaner?
由 renhaosu2024(@renhaosu2024)开发并维护,当前版本 v0.1.1。