← 返回 Skills 市场
alexmayanjun-collab

Etl Generator

作者 alexmayanjun-collab · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
161
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install etl-generator
功能描述
大数据 ETL 流程生成器 - 根据源表 DDL 生成标准化 ETL 加工 SQL(HiveSQL/MySQL)
使用说明 (SKILL.md)

ETL 流程生成器 - 大数据专家版

根据源表 DDL 自动生成标准化的 ETL 加工 SQL,支持 HiveSQL、MySQL、ODPS。

🎯 角色定位

大数据专家(20 年经验)

  • 精通 HiveSQL、MySQL、Shell、Python
  • 严格遵守大数据 ETL 加工规范
  • 注意字段类型转换、时区处理、数据质量

🔧 核心功能

1️⃣ 表名规范

  • 源表: ods_[表名]_di
  • 目标表: dwd_[表名]_di

2️⃣ 字段类型转换

  • _at_time 结尾的 TIMESTAMP 字段 → STRING(时区转换)
  • _date 结尾的字段 → STRING(不转换)
  • 其他字段 → 保持原类型

3️⃣ 时区转换

DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS created_at
DATE_FORMAT(FROM_UTC_TIMESTAMP(updated_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS updated_at

4️⃣ 分区字段

DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd") AS ds

5️⃣ 增量数据处理

  • 使用 ods_data_base_di
  • 支持 INSERT/UPDATE/DELETE 操作
  • 通过 _operation__after_image_ 识别

6️⃣ 去重逻辑

ROW_NUMBER() OVER(PARTITION BY id ORDER BY updated_at DESC) as rn
WHERE rn = 1

7️⃣ 字段排除 rn

SELECT `(rn)?+.+` FROM (...)

📋 使用方式

方式 1:命令行

# 从文件读取 DDL
python3 skills/etl-generator/etl_generator.py source_table.ddl > etl_sql.sql

# 从标准输入读取
cat source_table.ddl | python3 skills/etl-generator/etl_generator.py > etl_sql.sql

方式 2:直接调用

from etl_generator import parse_table_ddl, generate_target_table_ddl, generate_etl_sql

ddl = """
CREATE TABLE IF NOT EXISTS ods_delivery_attempt_di(
  id STRING COMMENT '主键',
  pno STRING COMMENT '运单号',
  client_id STRING COMMENT '客户 ID',
  returned BIGINT COMMENT '是否退货件',
  delivery_date STRING COMMENT '派送日期',
  marker_id BIGINT COMMENT '标记原因',
  store_id STRING COMMENT '网点 ID',
  created_at TIMESTAMP COMMENT '创建时间',
  updated_at TIMESTAMP COMMENT '更新时间'
) 
PARTITIONED BY (ds STRING) 
STORED AS ALIORC  
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="有效尝试派送详情") 
LIFECYCLE 36500;
"""

table_name, fields, table_comment = parse_table_ddl(ddl)
target_ddl = generate_target_table_ddl(table_name, fields, table_comment)
etl_sql = generate_etl_sql(table_name, fields, table_comment)

📝 输出示例

输入(源表 DDL)

CREATE TABLE IF NOT EXISTS ods_sap_store_cash_pay_info_di(
  id STRING COMMENT "主键",
  store_id STRING COMMENT "网点编号",
  business_date STRING COMMENT "业务日期",
  sap_state BIGINT COMMENT "0:待处理 1:待发送 2:不需要发送 3:已发送 4:异常",
  created_at TIMESTAMP COMMENT "创建时间",
  updated_at TIMESTAMP COMMENT "更新时间'
) 
PARTITIONED BY (ds STRING) 
STORED AS ALIORC 
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="SAP 门店现金支付信息") 
LIFECYCLE 36500;

输出(目标表 DDL + ETL SQL)

-- 目标表 DDL
CREATE TABLE IF NOT EXISTS dwd_sap_store_cash_pay_info_di(
  id STRING COMMENT '主键',
  store_id STRING COMMENT '网点编号',
  business_date STRING COMMENT '业务日期',
  sap_state BIGINT COMMENT '0:待处理 1:待发送 2:不需要发送 3:已发送 4:异常',
  created_at STRING COMMENT '创建时间',
  updated_at STRING COMMENT '更新时间'
) 
PARTITIONED BY (ds STRING) 
STORED AS ALIORC  
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="SAP 门店现金支付信息") 
LIFECYCLE 36500;

-- ETL 加工 SQL
WITH ods_data AS (
  SELECT
    id,
    store_id,
    business_date,
    sap_state,
    DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS created_at,
    DATE_FORMAT(FROM_UTC_TIMESTAMP(updated_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS updated_at,
    DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd") AS ds
  FROM ods_sap_store_cash_pay_info_di
  WHERE ds >= "${y-m-d}"
  UNION ALL
  SELECT
    get_json_object(values, "$.id") as id,
    get_json_object(values, "$.store_id") as store_id,
    get_json_object(values, "$.business_date") as business_date,
    get_json_object(values, "$.sap_state") as sap_state,
    DATE_FORMAT(FROM_UTC_TIMESTAMP(get_json_object(values, "$.created_at"), "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS created_at,
    DATE_FORMAT(FROM_UTC_TIMESTAMP(get_json_object(values, "$.updated_at"), "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS updated_at,
    DATE_FORMAT(FROM_UTC_TIMESTAMP(get_json_object(values, "$.created_at"), "${timezone}"), "yyyy-MM-dd") AS ds
  FROM ods_data_base_di 
  WHERE (
    (_after_image_ = "Y" AND _operation_ IN ("INSERT", "UPDATE"))
    OR (_operation_ = "DELETE" AND _before_image_ = "Y")
    OR _id_ IS NULL
  )
  AND ds >= "${y-m-d}"
  AND table_name = "sap_store_cash_pay_info"
  AND db_name = "source_db"
)
INSERT OVERWRITE TABLE dwd_sap_store_cash_pay_info_di PARTITION(ds)
SELECT `(rn)?+.+` FROM (
  SELECT 
    *,
    ROW_NUMBER() OVER(PARTITION BY id ORDER BY updated_at DESC) as rn 
  FROM (
    SELECT * FROM dwd_sap_store_cash_pay_info_di WHERE ds IN (
      SELECT DISTINCT ds FROM ods_data
    )
    UNION ALL
    SELECT * FROM ods_data
  ) a
) t1
WHERE rn = 1;

🧪 数据质量检查

自动生成以下检查 SQL:

  1. 主键空值检查
  2. 退货件比例检查(如果有 returned 字段)
  3. 数据量对比(源表 vs 目标表)

📋 字段映射说明

自动生成字段映射文档:

-- ============================================
-- 字段映射说明
-- ============================================
-- 源表字段 (7 个): id, store_id, business_date, sap_state, created_at, updated_at
-- 目标表字段 (7 个): id, store_id, business_date, sap_state, created_at, updated_at
-- 分区字段:ds
-- 
-- 字段转换规则:
-- created_at: TIMESTAMP → STRING, 时区转换
-- updated_at: TIMESTAMP → STRING, 时区转换
-- business_date: 直接映射
-- ============================================

⚙️ 配置参数

参数 说明 默认值
${timezone} 时区 UTC
${y-m-d} 业务日期 ${yyyymmdd-1}
${bizdate} 业务日期(质量检查) ${yyyymmdd-1}

📁 文件结构

skills/etl-generator/
├── SKILL.md              # 技能说明
├── etl_generator.py      # 核心脚本
├── README.md             # 使用文档
└── examples/             # 示例 DDL
    └── delivery_attempt.ddl

🔧 高级用法

1. 批量生成

# 批量处理多个表
for ddl in ddl/*.ddl; do
  python3 skills/etl-generator/etl_generator.py $ddl > etl/$(basename $ddl .ddl)_etl.sql
done

2. 自定义模板

修改 etl_generator.py 中的模板函数,适配特定业务场景。

3. 集成 DataWorks

# 生成 DataWorks 节点配置
python3 skills/etl-generator/etl_generator.py source.ddl | \
  python3 skills/etl-generator/dataworks_adapter.py > node_config.yaml

⚠️ 注意事项

1. 字段顺序

  • 确保输入输出的字段顺序个数一致
  • 使用 (rn)?+.+ 排除 rn 字段

2. 时区处理

  • 所有时间字段必须做时区转换
  • _date 结尾的字段不转换

3. 表名规范

  • 源表:ods_[表名]_di
  • 目标表:dwd_[表名]_di
  • WITH 引用使用原表名

4. 增量数据

  • 使用 ods_data_base_di
  • 正确配置 table_namedb_name

📊 版本历史

v2.0 (2026-03-06)

  • ✅ 优化时区转换逻辑
  • ✅ 支持增量数据处理
  • ✅ 自动生成数据质量检查
  • ✅ 自动生成字段映射说明
  • ✅ 字段排除 rn 字段

v1.0 (2026-02-28)

  • ✅ 基础 ETL 生成功能
  • ✅ 字段类型转换
  • ✅ 目标表 DDL 生成

维护者: 汉克 (Hank)
更新时间: 2026-03-06

安全使用建议
This skill appears to do what it claims: parse a provided CREATE TABLE DDL and emit target DDL, ETL SQL, and quality-check SQL. Before using: (1) provide DDL input yourself (the script does not connect to databases or fetch schemas), (2) review generated SQL for correctness and environment-specific settings (table/db names, partition handling, storage format), and (3) be aware the DDL parser is simple (regex-based) and may not handle complex/edge-case DDL — validate outputs in a safe environment before applying to production.
功能分析
Type: OpenClaw Skill Name: etl-generator Version: 1.0.0 The etl-generator skill is a legitimate utility designed to automate the creation of ETL SQL scripts from database DDL statements. The core logic in `etl_generator.py` uses standard string manipulation and regular expressions to transform table definitions, and it lacks any high-risk behaviors such as network requests, file system modifications, or credential access. The instructions in `SKILL.md` and `README.md` are consistent with the tool's stated purpose and do not contain any prompt-injection attempts or malicious directives.
能力评估
Purpose & Capability
Name/description (ETL SQL generator) match the included README, SKILL.md, and the Python script. The code implements DDL parsing, type/field conversion, ETL SQL and quality-check generation — all expected for the described functionality.
Instruction Scope
SKILL.md and README instruct the agent/user to provide DDL via file or stdin and show how to call the Python functions; etl_generator.py reads a local file or stdin and prints SQL to stdout. The docs mention 'query source table (if accessible)' but there is no code that performs network/database queries or reads system config/credentials — this is a documentation note rather than hidden behavior.
Install Mechanism
No install spec is provided and the skill is instruction-plus-script only. Nothing is downloaded or written by an installer; risk from install mechanism is minimal.
Credentials
The skill requires no environment variables, credentials, or config paths. It only reads DDL from a local file or stdin as documented. There is no use of SECRET/TOKEN/KEY env vars in code or docs.
Persistence & Privilege
Flags show default behavior (not always:true). The skill does not request persistent system privileges, does not modify other skill configs, and contains no autorun/install logic.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install etl-generator
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /etl-generator 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of etl-generator skill. - Generates standardized ETL SQL from source table DDL, supporting HiveSQL, MySQL, ODPS. - Implements field type conversion, time zone handling, partition generation, and de-duplication logic. - Supports command-line and direct Python usage. - Generates target table DDL, ETL processing SQL, data quality check SQL, and field mapping documentation. - Provides usage instructions, configuration parameters, and guidelines for batch and advanced scenarios.
元数据
Slug etl-generator
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Etl Generator 是什么?

大数据 ETL 流程生成器 - 根据源表 DDL 生成标准化 ETL 加工 SQL(HiveSQL/MySQL). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 161 次。

如何安装 Etl Generator?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install etl-generator」即可一键安装,无需额外配置。

Etl Generator 是免费的吗?

是的,Etl Generator 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Etl Generator 支持哪些平台?

Etl Generator 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Etl Generator?

由 alexmayanjun-collab(@alexmayanjun-collab)开发并维护,当前版本 v1.0.0。

💬 留言讨论