← Back to Skills Marketplace

Clinical Data Cleaner

Name: Clinical Data Cleaner
Author: renhaosu2024

by renhaosu2024 · GitHub ↗ · v0.1.1 · MIT-0

cross-platform ✓ Security Clean

402

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install clinical-data-cleaner

Description

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec...

README (SKILL.md)

Clinical Data Cleaner

Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.

Quick Start

from scripts.main import ClinicalDataCleaner

# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')

# Clean data with default settings
cleaned = cleaner.clean(raw_data)

# Save with audit trail
cleaner.save_report('output.csv')

Core Capabilities

1. SDTM Domain Validation

cleaner = ClinicalDataCleaner(domain='DM')  # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)

Required Fields:

DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC

2. Missing Value Handling

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median'  # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)

3. Outlier Detection

cleaner = ClinicalDataCleaner(
    domain='LB',
    outlier_method='domain',  # iqr, zscore, domain
    outlier_action='flag'     # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)

Clinical Thresholds:

Parameter	Range	Unit
Glucose	50-500	mg/dL
Hemoglobin	5-20	g/dL
Systolic BP	70-220	mmHg

4. Date Standardization

standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00

5. Complete Pipeline

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median',
    outlier_method='iqr',
    outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')

Output Files:

output.csv - Cleaned SDTM data
output.report.json - Audit trail for regulatory submission

CLI Usage

# Clean demographics
python scripts/main.py \
  --input dm_raw.csv \
  --domain DM \
  --output dm_clean.csv \
  --missing-strategy median \
  --outlier-method iqr \
  --outlier-action flag

# Clean lab data with clinical thresholds
python scripts/main.py \
  --input lb_raw.csv \
  --domain LB \
  --output lb_clean.csv \
  --outlier-method domain

Common Patterns

See references/common-patterns.md for detailed examples:

Regulatory Submission Preparation
Interim Analysis Data Preparation
Database Migration Cleanup
External Lab Data Integration

Troubleshooting

See references/troubleshooting.md for solutions to:

Validation failures
Date parsing errors
Memory errors with large datasets
Outlier detection issues

Quality Checklist

Pre-Cleaning:

IACUC approval obtained (animal studies)
Sample size adequately powered
Randomization method documented

Post-Cleaning:

Validate against CDISC SDTM IG
Review all cleaning actions in audit trail
Test import to analysis software

References

references/sdtm_ig_guide.md - CDISC SDTM Implementation Guide
references/domain_specs.json - Domain-specific field requirements
references/outlier_thresholds.json - Clinical outlier thresholds
references/common-patterns.md - Detailed usage patterns
references/troubleshooting.md - Problem-solving guide

Skill ID: 189 | Version: 2.0 | License: MIT

Usage Guidance

This skill appears internally consistent and implements the documented SDTM cleaning functionality. Before installing or running it: 1) Review scripts/main.py locally to confirm there are no hidden network calls or unexpected behavior (the code provided looks local-file oriented). 2) Run it only in a secure environment when processing regulated patient data (PHI); ensure raw data backups and do not overwrite originals. 3) Install dependencies in an isolated environment (virtualenv/container) per requirements.txt. 4) Inspect or provide any config files you pass via config_path to avoid accidentally exposing secrets or pointing at sensitive locations. 5) Note minor metadata mismatches (SKILL.md/version/tile.json) — verify you're using the intended release before relying on it for regulatory submissions. If you need higher assurance, ask the author for provenance, tests, and a signed release or run the tool on synthetic data to validate behavior first.

Capability Analysis

Type: OpenClaw Skill Name: clinical-data-cleaner Version: 0.1.1 The clinical-data-cleaner skill is a legitimate tool designed to standardize clinical trial data into CDISC SDTM formats. The core logic in scripts/main.py uses standard data processing libraries (pandas, numpy) to perform validation, missing value imputation, and outlier detection based on clinical thresholds defined in the references/ directory. There is no evidence of data exfiltration, malicious command execution, or prompt injection; the requested permissions (Read, Write, Bash, Edit) are consistent with a tool intended to process and modify data files locally.

Capability Assessment

✓ Purpose & Capability

Name/description (SDTM cleaning, outlier detection, imputation) match the provided code, JSON reference files, and CLI. Required dependencies (numpy, pandas, scipy) are appropriate for data processing. No unrelated credentials, binaries, or config paths are requested.

✓ Instruction Scope

SKILL.md directs the agent to load local files, run cleaning functions, and write output and audit trail files—actions consistent with a data-cleaning tool. The instructions do not ask the agent to read environment variables, network endpoints, or unrelated system paths. The skill will read arbitrary input files supplied by the user (expected behavior for this tool).

✓ Install Mechanism

There is no install spec (instruction-only skill plus included script). Dependencies are declared in requirements.txt and scripts/requirements.txt (numpy, pandas, scipy), which is proportionate for numeric/data tasks. No downloads from arbitrary URLs or archive extraction are present.

✓ Credentials

The skill does not declare or require environment variables, credentials, or remote tokens. It can optionally load a local JSON config file (config_path) — reading a local config file is reasonable but means the skill can access arbitrary files the user points it to.

✓ Persistence & Privilege

always is false and the skill does not request persistent platform privileges. It writes output and audit trail files as expected for a cleaner and does not modify other skills or system-wide settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install clinical-data-cleaner
After installation, invoke the skill by name or use /clinical-data-cleaner
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.1.1

No changes detected in this version. - No file or documentation updates since the previous release. - Functionality and usage remain unchanged.

v0.1.0

Initial release of clinical-data-cleaner. - Cleans and standardizes clinical trial data for FDA/EMA submission using CDISC SDTM domains (DM, LB, VS) - Handles missing values with multiple strategies (mean, median, mode, forward, drop) - Detects and handles outliers using IQR, z-score, or clinical domain thresholds - Standardizes date formats to ISO 8601 - Generates audit trails for regulatory compliance - Includes CLI and Python API for data cleaning workflows

Metadata

Slug clinical-data-cleaner

Version 0.1.1

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 2

Frequently Asked Questions

What is Clinical Data Cleaner?

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec... It is an AI Agent Skill for Claude Code / OpenClaw, with 402 downloads so far.

How do I install Clinical Data Cleaner?

Run "/install clinical-data-cleaner" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Clinical Data Cleaner free?

Yes, Clinical Data Cleaner is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Clinical Data Cleaner support?

Clinical Data Cleaner is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Clinical Data Cleaner?

It is built and maintained by renhaosu2024 (@renhaosu2024); the current version is v0.1.1.

More Skills

Clinical Data Cleaner

Clinical Data Cleaner

Quick Start

Core Capabilities

1. SDTM Domain Validation

2. Missing Value Handling

3. Outlier Detection

4. Date Standardization

5. Complete Pipeline

CLI Usage

Common Patterns

Troubleshooting

Quality Checklist

References

What is Clinical Data Cleaner?

How do I install Clinical Data Cleaner?

Is Clinical Data Cleaner free?

Which platforms does Clinical Data Cleaner support?

Who created Clinical Data Cleaner?

💬 Comments