← 返回 Skills 市场
renhaosu2024

Clinical Data Cleaner

作者 renhaosu2024 · GitHub ↗ · v0.1.1 · MIT-0
cross-platform ✓ 安全检测通过
402
总下载
0
收藏
1
当前安装
2
版本数
在 OpenClaw 中安装
/install clinical-data-cleaner
功能描述
Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec...
使用说明 (SKILL.md)

Clinical Data Cleaner

Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.

Quick Start

from scripts.main import ClinicalDataCleaner

# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')

# Clean data with default settings
cleaned = cleaner.clean(raw_data)

# Save with audit trail
cleaner.save_report('output.csv')

Core Capabilities

1. SDTM Domain Validation

cleaner = ClinicalDataCleaner(domain='DM')  # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)

Required Fields:

  • DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
  • LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
  • VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC

2. Missing Value Handling

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median'  # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)

3. Outlier Detection

cleaner = ClinicalDataCleaner(
    domain='LB',
    outlier_method='domain',  # iqr, zscore, domain
    outlier_action='flag'     # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)

Clinical Thresholds:

Parameter Range Unit
Glucose 50-500 mg/dL
Hemoglobin 5-20 g/dL
Systolic BP 70-220 mmHg

4. Date Standardization

standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00

5. Complete Pipeline

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median',
    outlier_method='iqr',
    outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')

Output Files:

  • output.csv - Cleaned SDTM data
  • output.report.json - Audit trail for regulatory submission

CLI Usage

# Clean demographics
python scripts/main.py \
  --input dm_raw.csv \
  --domain DM \
  --output dm_clean.csv \
  --missing-strategy median \
  --outlier-method iqr \
  --outlier-action flag

# Clean lab data with clinical thresholds
python scripts/main.py \
  --input lb_raw.csv \
  --domain LB \
  --output lb_clean.csv \
  --outlier-method domain

Common Patterns

See references/common-patterns.md for detailed examples:

  • Regulatory Submission Preparation
  • Interim Analysis Data Preparation
  • Database Migration Cleanup
  • External Lab Data Integration

Troubleshooting

See references/troubleshooting.md for solutions to:

  • Validation failures
  • Date parsing errors
  • Memory errors with large datasets
  • Outlier detection issues

Quality Checklist

Pre-Cleaning:

  • IACUC approval obtained (animal studies)
  • Sample size adequately powered
  • Randomization method documented

Post-Cleaning:

  • Validate against CDISC SDTM IG
  • Review all cleaning actions in audit trail
  • Test import to analysis software

References

  • references/sdtm_ig_guide.md - CDISC SDTM Implementation Guide
  • references/domain_specs.json - Domain-specific field requirements
  • references/outlier_thresholds.json - Clinical outlier thresholds
  • references/common-patterns.md - Detailed usage patterns
  • references/troubleshooting.md - Problem-solving guide

Skill ID: 189 | Version: 2.0 | License: MIT

安全使用建议
This skill appears internally consistent and implements the documented SDTM cleaning functionality. Before installing or running it: 1) Review scripts/main.py locally to confirm there are no hidden network calls or unexpected behavior (the code provided looks local-file oriented). 2) Run it only in a secure environment when processing regulated patient data (PHI); ensure raw data backups and do not overwrite originals. 3) Install dependencies in an isolated environment (virtualenv/container) per requirements.txt. 4) Inspect or provide any config files you pass via config_path to avoid accidentally exposing secrets or pointing at sensitive locations. 5) Note minor metadata mismatches (SKILL.md/version/tile.json) — verify you're using the intended release before relying on it for regulatory submissions. If you need higher assurance, ask the author for provenance, tests, and a signed release or run the tool on synthetic data to validate behavior first.
功能分析
Type: OpenClaw Skill Name: clinical-data-cleaner Version: 0.1.1 The clinical-data-cleaner skill is a legitimate tool designed to standardize clinical trial data into CDISC SDTM formats. The core logic in scripts/main.py uses standard data processing libraries (pandas, numpy) to perform validation, missing value imputation, and outlier detection based on clinical thresholds defined in the references/ directory. There is no evidence of data exfiltration, malicious command execution, or prompt injection; the requested permissions (Read, Write, Bash, Edit) are consistent with a tool intended to process and modify data files locally.
能力评估
Purpose & Capability
Name/description (SDTM cleaning, outlier detection, imputation) match the provided code, JSON reference files, and CLI. Required dependencies (numpy, pandas, scipy) are appropriate for data processing. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md directs the agent to load local files, run cleaning functions, and write output and audit trail files—actions consistent with a data-cleaning tool. The instructions do not ask the agent to read environment variables, network endpoints, or unrelated system paths. The skill will read arbitrary input files supplied by the user (expected behavior for this tool).
Install Mechanism
There is no install spec (instruction-only skill plus included script). Dependencies are declared in requirements.txt and scripts/requirements.txt (numpy, pandas, scipy), which is proportionate for numeric/data tasks. No downloads from arbitrary URLs or archive extraction are present.
Credentials
The skill does not declare or require environment variables, credentials, or remote tokens. It can optionally load a local JSON config file (config_path) — reading a local config file is reasonable but means the skill can access arbitrary files the user points it to.
Persistence & Privilege
always is false and the skill does not request persistent platform privileges. It writes output and audit trail files as expected for a cleaner and does not modify other skills or system-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install clinical-data-cleaner
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /clinical-data-cleaner 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.1
No changes detected in this version. - No file or documentation updates since the previous release. - Functionality and usage remain unchanged.
v0.1.0
Initial release of clinical-data-cleaner. - Cleans and standardizes clinical trial data for FDA/EMA submission using CDISC SDTM domains (DM, LB, VS) - Handles missing values with multiple strategies (mean, median, mode, forward, drop) - Detects and handles outliers using IQR, z-score, or clinical domain thresholds - Standardizes date formats to ISO 8601 - Generates audit trails for regulatory compliance - Includes CLI and Python API for data cleaning workflows
元数据
Slug clinical-data-cleaner
版本 0.1.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 2
常见问题

Clinical Data Cleaner 是什么?

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 402 次。

如何安装 Clinical Data Cleaner?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install clinical-data-cleaner」即可一键安装,无需额外配置。

Clinical Data Cleaner 是免费的吗?

是的,Clinical Data Cleaner 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Clinical Data Cleaner 支持哪些平台?

Clinical Data Cleaner 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Clinical Data Cleaner?

由 renhaosu2024(@renhaosu2024)开发并维护,当前版本 v0.1.1。

💬 留言讨论