← Back to Skills Marketplace
renhaosu2024

Clinical Data Cleaner

by renhaosu2024 · GitHub ↗ · v0.1.1 · MIT-0
cross-platform ✓ Security Clean
402
Downloads
0
Stars
1
Active Installs
2
Versions
Install in OpenClaw
/install clinical-data-cleaner
Description
Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec...
README (SKILL.md)

Clinical Data Cleaner

Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.

Quick Start

from scripts.main import ClinicalDataCleaner

# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')

# Clean data with default settings
cleaned = cleaner.clean(raw_data)

# Save with audit trail
cleaner.save_report('output.csv')

Core Capabilities

1. SDTM Domain Validation

cleaner = ClinicalDataCleaner(domain='DM')  # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)

Required Fields:

  • DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
  • LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
  • VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC

2. Missing Value Handling

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median'  # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)

3. Outlier Detection

cleaner = ClinicalDataCleaner(
    domain='LB',
    outlier_method='domain',  # iqr, zscore, domain
    outlier_action='flag'     # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)

Clinical Thresholds:

Parameter Range Unit
Glucose 50-500 mg/dL
Hemoglobin 5-20 g/dL
Systolic BP 70-220 mmHg

4. Date Standardization

standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00

5. Complete Pipeline

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median',
    outlier_method='iqr',
    outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')

Output Files:

  • output.csv - Cleaned SDTM data
  • output.report.json - Audit trail for regulatory submission

CLI Usage

# Clean demographics
python scripts/main.py \
  --input dm_raw.csv \
  --domain DM \
  --output dm_clean.csv \
  --missing-strategy median \
  --outlier-method iqr \
  --outlier-action flag

# Clean lab data with clinical thresholds
python scripts/main.py \
  --input lb_raw.csv \
  --domain LB \
  --output lb_clean.csv \
  --outlier-method domain

Common Patterns

See references/common-patterns.md for detailed examples:

  • Regulatory Submission Preparation
  • Interim Analysis Data Preparation
  • Database Migration Cleanup
  • External Lab Data Integration

Troubleshooting

See references/troubleshooting.md for solutions to:

  • Validation failures
  • Date parsing errors
  • Memory errors with large datasets
  • Outlier detection issues

Quality Checklist

Pre-Cleaning:

  • IACUC approval obtained (animal studies)
  • Sample size adequately powered
  • Randomization method documented

Post-Cleaning:

  • Validate against CDISC SDTM IG
  • Review all cleaning actions in audit trail
  • Test import to analysis software

References

  • references/sdtm_ig_guide.md - CDISC SDTM Implementation Guide
  • references/domain_specs.json - Domain-specific field requirements
  • references/outlier_thresholds.json - Clinical outlier thresholds
  • references/common-patterns.md - Detailed usage patterns
  • references/troubleshooting.md - Problem-solving guide

Skill ID: 189 | Version: 2.0 | License: MIT

Usage Guidance
This skill appears internally consistent and implements the documented SDTM cleaning functionality. Before installing or running it: 1) Review scripts/main.py locally to confirm there are no hidden network calls or unexpected behavior (the code provided looks local-file oriented). 2) Run it only in a secure environment when processing regulated patient data (PHI); ensure raw data backups and do not overwrite originals. 3) Install dependencies in an isolated environment (virtualenv/container) per requirements.txt. 4) Inspect or provide any config files you pass via config_path to avoid accidentally exposing secrets or pointing at sensitive locations. 5) Note minor metadata mismatches (SKILL.md/version/tile.json) — verify you're using the intended release before relying on it for regulatory submissions. If you need higher assurance, ask the author for provenance, tests, and a signed release or run the tool on synthetic data to validate behavior first.
Capability Analysis
Type: OpenClaw Skill Name: clinical-data-cleaner Version: 0.1.1 The clinical-data-cleaner skill is a legitimate tool designed to standardize clinical trial data into CDISC SDTM formats. The core logic in scripts/main.py uses standard data processing libraries (pandas, numpy) to perform validation, missing value imputation, and outlier detection based on clinical thresholds defined in the references/ directory. There is no evidence of data exfiltration, malicious command execution, or prompt injection; the requested permissions (Read, Write, Bash, Edit) are consistent with a tool intended to process and modify data files locally.
Capability Assessment
Purpose & Capability
Name/description (SDTM cleaning, outlier detection, imputation) match the provided code, JSON reference files, and CLI. Required dependencies (numpy, pandas, scipy) are appropriate for data processing. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md directs the agent to load local files, run cleaning functions, and write output and audit trail files—actions consistent with a data-cleaning tool. The instructions do not ask the agent to read environment variables, network endpoints, or unrelated system paths. The skill will read arbitrary input files supplied by the user (expected behavior for this tool).
Install Mechanism
There is no install spec (instruction-only skill plus included script). Dependencies are declared in requirements.txt and scripts/requirements.txt (numpy, pandas, scipy), which is proportionate for numeric/data tasks. No downloads from arbitrary URLs or archive extraction are present.
Credentials
The skill does not declare or require environment variables, credentials, or remote tokens. It can optionally load a local JSON config file (config_path) — reading a local config file is reasonable but means the skill can access arbitrary files the user points it to.
Persistence & Privilege
always is false and the skill does not request persistent platform privileges. It writes output and audit trail files as expected for a cleaner and does not modify other skills or system-wide settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install clinical-data-cleaner
  3. After installation, invoke the skill by name or use /clinical-data-cleaner
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.1
No changes detected in this version. - No file or documentation updates since the previous release. - Functionality and usage remain unchanged.
v0.1.0
Initial release of clinical-data-cleaner. - Cleans and standardizes clinical trial data for FDA/EMA submission using CDISC SDTM domains (DM, LB, VS) - Handles missing values with multiple strategies (mean, median, mode, forward, drop) - Detects and handles outliers using IQR, z-score, or clinical domain thresholds - Standardizes date formats to ISO 8601 - Generates audit trails for regulatory compliance - Includes CLI and Python API for data cleaning workflows
Metadata
Slug clinical-data-cleaner
Version 0.1.1
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 2
Frequently Asked Questions

What is Clinical Data Cleaner?

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec... It is an AI Agent Skill for Claude Code / OpenClaw, with 402 downloads so far.

How do I install Clinical Data Cleaner?

Run "/install clinical-data-cleaner" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Clinical Data Cleaner free?

Yes, Clinical Data Cleaner is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Clinical Data Cleaner support?

Clinical Data Cleaner is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Clinical Data Cleaner?

It is built and maintained by renhaosu2024 (@renhaosu2024); the current version is v0.1.1.

💬 Comments