← Back to Skills Marketplace

REDCap Data Dictionary Generator

Name: REDCap Data Dictionary Generator
Author: kenlcj

by kenlcj · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install redcap-data-dictionary-generator

Description

开发者：邹和建、刘从进 REDCap Data Dictionary Generator - 将 Word/Excel 文档（CRF/方案）转换为 CSV 格式的 REDCap 数据字典。 ⚠️ 原 redcap-crf-generator 已不再更新，请使用本版本。适用场景： - 用户上传临床试验 CRF/方...

README (SKILL.md)

REDCap Data Dictionary Generator

⚠️ 原 redcap-crf-generator 已不再更新，请使用本版本。

概述

本技能将临床试验 CRF/方案文档（Word/Excel/PDF）转换为符合 REDCap 标准的数据字典 CSV 文件。

核心流程

文档解析 → 使用 markitdown 转换为 Markdown，充分理解文档结构
图片识别 → 对文档中的评分表/诊断标准图片进行 OCR 识别
字段生成 → 按照 REDCap 规范生成数据字典
格式修正 → 确保 Section Header、验证类型、计算字段等符合规范

数据字典格式（REDCap CSV）

列名	说明	示例
Variable / Field Name	字段变量名，CDISC规范	`sex`, `ie_1`, `dm_3`
Form Name	表单英文名	`demography`, `inclusion_exclusion`
Section Header	分节标题（仅首字段填写）	`患者基本信息`
Field Type	字段类型	`text`, `dropdown`, `radio`, `checkbox`, `calc`, `notes`, `file`
Field Label	字段中文标签	`性别`, `年龄（岁）`
Choices, Calculations, OR Slider Labels	选项或计算公式	`1, 男 \| 2, 女` 或 `round([weight]/(([height]/100)^2),1)`
Field Note	特殊说明/格式要求	`单位：岁`, `YYYY-MM-DD`
Text Validation Type	验证类型	`date`, `number`, `integer`, `datetime`
Text Validation Min/Max	数值范围	`0`, `120`
Identifier?	是否隐私字段（仅限姓名、身份证等直接身份标识）	`y`（是）或留空
Branching Logic	分支逻辑	`[dm_10] = "7"`
Required Field?	是否必填	`y`（是）或留空

⚠️ 关键规则

1. Section Header 仅首字段填写

同组字段只在第一个字段设置 Section Header，后续字段留空。

Variable / Field Name,Form Name,Section Header,Field Type,Field Label,...
record_id,inclusion_exclusion,,text,Record ID,...
enroll_date,inclusion_exclusion,入排标准判定,text,入组日期,...
ie_1,inclusion_exclusion,,dropdown,纳入标准1：≥18周岁,...
ie_2,inclusion_exclusion,,dropdown,纳入标准2：同种异体肝移植术后,...

2. record_id 必须为第一行

第一个字段必须是 record_id，类型为 text，标签为 "Record ID"。

Variable / Field Name,Form Name,Section Header,Field Type,Field Label,...
record_id,inclusion_exclusion,,text,Record ID,...

3. Identifier? 仅用于直接身份标识字段

仅当字段涉及患者直接身份标识（如姓名、身份证号、住院号、手机号等）时设置 y。一般人口学资料（年龄、性别、体重等）不属于隐私标识，不要设置。

Variable / Field Name,Field Type,Identifier?,...
dm_1,text,y,...  # 编号（含姓名首字母），属于隐私
dm_3,text,,...   # 年龄，不属于隐私，无需设置

4. calc 字段不需要验证类型

计算字段（calc）的 Text Validation Type / Min / Max 留空。

Variable / Field Name,Field Type,Choices, Calculations, OR Slider Labels,Text Validation Type,...
dm_bmi,calc,round([dm_6]/(([dm_5]/100)^2),1),,...

字段类型选择规则

选项数量决定字段类型：

≤4个选项 → 使用 radio（单选按钮），界面更直观

≥5个选项 → 使用 dropdown（下拉选择），避免界面拥挤

多选 → 使用 checkbox

选项数	推荐类型	示例
2-4	`radio`	`1, 是 \| 0, 否`
≥5	`dropdown`	`1, HBV \| 2, HCV \| 3, DILI \| 4, PBC \| 5, 肿瘤 \| 6, 其他`
多选	`checkbox`	`1, 血流 \| 2, 肺部 \| 3, 腹腔 \| 4, 泌尿系统`

支持的字段类型

类型	说明	Choices 格式
`text`	单行文本	无
`notes`	多行文本/备注	无
`radio`	单选按钮（≤4个选项）	`0, 否 \| 1, 是`
`dropdown`	下拉选择（≥5个选项）	`0, 否 \| 1, 是 \| 2, 其他`
`checkbox`	多选框	`1, 选项1 \| 2, 选项2 \| 3, 选项3`
`calc`	计算字段	`round([weight]*10000/([height]^2),1)`
`file`	文件上传	无
`date`	日期（用 text + date 验证）	无
`datetime`	日期时间（用 text + datetime 验证）	无

处理复杂文档的技巧

1. 文档结构识别

使用 markitdown 将文档转为 Markdown
识别 表X： 或 表X 格式的表单标题（注意可能混用全角/半角空格）
段落中的 {}、（） 包含字段定义

2. 图片 OCR 识别

当文档包含评分表图片（如 SOFA、APACHE、GCS、诊断标准）时：

从 docx 中提取图片（word/media/ 目录）
使用 image 工具识别图片内容
将识别结果转换为结构化字段

3. 括号兼容性

文档可能混用 ASCII 和全角括号：

ASCII: {单选，是，否}
全角: ｛单选，是，否｝
处理时需同时检查两种格式

4. 分支逻辑处理

分支逻辑写在 Choices 中，通过 [字段] = "值" 格式标注：

choices = "1, 是 | 0, 否"
branching = '[dm_10] = "7"'  # 当选择"其他"时显示备注文本

CDISC 变量命名建议

前缀	表单	示例
`ie_`	inclusion_exclusion 入排标准	`ie_1`, `ie_2`
`dm_`	demography 患者基本信息	`dm_1`, `dm_3`
`meld_`	pre_meld MELD评分	`meld_inr`, `meld_score`
`sofa_p_`	pre_sofa 术前SOFA	`sofa_p_gcs`, `sofa_p_total`
`apach_`	pre_apache APACHE评分	`apach_p_temp`, `apach_p_total`
`cci_`	pre_cci Charlson合并症	`cci_1`, `cci_total`
`infrf_`	preop_infrf 术前感染因素	`infrf_1`, `infrf_3_detail`
`op_`	op_info 手术信息	`op_date`, `op_blood_rbc`
`don_`	donor_info 供体信息	`don_age`, `don_hbsag`
`inf_`	infection_info 感染信息	`inf_date`, `inf_site`
`sofa_i_`	infection_sofa 感染时SOFA	`sofa_i_pf`, `sofa_i_total`
`apach_i_`	infection_apache 感染时APACHE	`apach_i_gcs`, `apach_i_total`
`bsi_`	bsi_criteria 血流感染标准	`bsi_1`, `bsi_2_symptom`
`abi_`	abi_criteria 腹腔感染标准	`abi_ssi`, `abi_ia_clinical`
`pulm_`	pulm_criteria 肺部感染标准	`pulm_img_1`, `pulm_symptom`
`fu_`	treatment_fu 随访	`fu_date`, `fu_abx`
`out_`	outcome 结局	`out_clinical`, `out_survive_90d`

使用方式

当用户上传文档并要求生成数据字典时：

1. 读取文档（markitdown 转换为 Markdown）
2. 提取并识别文档中的图片（如有评分表）
3. 解析表单结构和字段定义
4. 按上述规则生成数据字典
5. 确保 record_id 为第一行
6. 通过飞书发送 CSV 文件

依赖

pip install python-docx lxml markitdown

Usage Guidance

This skill looks safe for its stated purpose. Before using it with clinical or institutional documents, install dependencies in a controlled environment, choose the CSV output location carefully, and confirm whether sending the generated file through Feishu is acceptable.

Capability Analysis

Type: OpenClaw Skill Name: redcap-data-dictionary-generator Version: 1.0.0 The skill bundle is a utility designed to convert clinical Case Report Form (CRF) documents into REDCap-compatible data dictionary CSV files. The provided Python scripts (generate_datadict.py and process_upload.py) use standard libraries such as python-docx to parse document structure, identify fields, and format them according to REDCap specifications. The SKILL.md file provides clear, task-oriented instructions for the AI agent without any evidence of malicious prompt injection, data exfiltration, or unauthorized command execution. The logic is focused entirely on document parsing and CSV generation.

Capability Assessment

ℹ Purpose & Capability

The stated purpose and included code are coherent: the skill reads user-provided Word/clinical form content and generates a REDCap CSV. One provided helper source artifact is truncated in the review data, so confidence is not high.

ℹ Instruction Scope

Instructions are mostly scoped to document parsing, OCR, field generation, and CSV return. The instruction to send the CSV through Feishu is disclosed, but users should confirm that destination is intended.

ℹ Install Mechanism

The skill declares normal Python/PyPI dependencies for document parsing, but package versions are not pinned and the registry says there is no install spec.

ℹ Credentials

Local reading of uploaded documents and writing an output CSV are proportionate to the purpose. No broad filesystem scan, credential use, or hidden network endpoint is shown.

✓ Persistence & Privilege

No background persistence, autonomous worker, credentials, tokens, or privilege escalation are evidenced. The only persistence shown is the generated CSV output file.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install redcap-data-dictionary-generator
After installation, invoke the skill by name or use /redcap-data-dictionary-generator
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

初版发布。从原 redcap-crf-generator 迁移，后续更新将在此版本进行，原技能不再更新。

Metadata

Slug redcap-data-dictionary-generator

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is REDCap Data Dictionary Generator?

开发者：邹和建、刘从进 REDCap Data Dictionary Generator - 将 Word/Excel 文档（CRF/方案）转换为 CSV 格式的 REDCap 数据字典。 ⚠️ 原 redcap-crf-generator 已不再更新，请使用本版本。适用场景： - 用户上传临床试验 CRF/方... It is an AI Agent Skill for Claude Code / OpenClaw, with 12 downloads so far.

How do I install REDCap Data Dictionary Generator?

Run "/install redcap-data-dictionary-generator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is REDCap Data Dictionary Generator free?

Yes, REDCap Data Dictionary Generator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does REDCap Data Dictionary Generator support?

REDCap Data Dictionary Generator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created REDCap Data Dictionary Generator?

It is built and maintained by kenlcj (@kenlcj); the current version is v1.0.0.

More Skills

REDCap Data Dictionary Generator

REDCap Data Dictionary Generator

概述

核心流程

数据字典格式（REDCap CSV）

⚠️ 关键规则

1. Section Header 仅首字段填写

2. record_id 必须为第一行

3. Identifier? 仅用于直接身份标识字段

4. calc 字段不需要验证类型

字段类型选择规则

支持的字段类型

处理复杂文档的技巧

1. 文档结构识别

2. 图片 OCR 识别

3. 括号兼容性

4. 分支逻辑处理

CDISC 变量命名建议

使用方式

依赖

What is REDCap Data Dictionary Generator?

How do I install REDCap Data Dictionary Generator?

Is REDCap Data Dictionary Generator free?

Which platforms does REDCap Data Dictionary Generator support?

Who created REDCap Data Dictionary Generator?

💬 Comments