Avro Schema指南

基本 Avro Schema

{ "type": "record", "name": "User", "namespace": "com.example", "doc": "用户记录", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "email", "type": ["null", "string"], "default": null}, {"name": "age", "type": "int", "default": 0}, {"name": "active", "type": "boolean", "default": true}, {"name": "created_at", "type": "long", "logicalType": "timestamp-millis"} ] }

Avro 原始类型

类型说明大小
null空值0
boolean布尔值1 bit
int32位有符号整数variable
long64位有符号整数variable
float32位浮点4 bytes
double64位浮点8 bytes
bytes字节序列variable
stringUnicode 字符串variable

复合类型

// 数组 {"type": "array", "items": "string"} // 映射 {"type": "map", "values": "int"} // Union (nullable field) ["null", "string"] // 枚举 { "type": "enum", "name": "Status", "symbols": ["ACTIVE", "INACTIVE", "PENDING"] } // 嵌套记录 { "type": "record", "name": "Address", "fields": [ {"name": "street", "type": "string"}, {"name": "city", "type": "string"} ] }

Schema 演进规则

变更兼容?
添加有默认值的字段✓ 向前+向后
删除有默认值的字段✓ 向前+向后
添加无默认值的字段仅向前
删除必填字段仅向后
修改字段类型✗ 破坏性

Python 示例

import avro.schema from avro.datafile import DataFileWriter from avro.io import DatumWriter schema = avro.schema.parse(open('user.avsc').read()) writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema) writer.append({"id": 1, "name": "Alice", "email": None, "active": True}) writer.close()