Avro Schema Guide

Basic Avro Schema

{ "type": "record", "name": "User", "namespace": "com.example", "doc": "A user record", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "email", "type": ["null", "string"], "default": null}, {"name": "age", "type": "int", "default": 0}, {"name": "active", "type": "boolean", "default": true}, {"name": "created_at", "type": "long", "logicalType": "timestamp-millis"} ] }

Avro Primitive Types

TypeDescriptionSize
nullNo value0
booleanBoolean1 bit
int32-bit signed integervariable
long64-bit signed integervariable
float32-bit float4 bytes
double64-bit float8 bytes
bytesByte sequencevariable
stringUnicode stringvariable

Complex Types

// Array {"type": "array", "items": "string"} // Map {"type": "map", "values": "int"} // Union (nullable field) ["null", "string"] // Enum { "type": "enum", "name": "Status", "symbols": ["ACTIVE", "INACTIVE", "PENDING"] } // Nested record { "type": "record", "name": "Address", "fields": [ {"name": "street", "type": "string"}, {"name": "city", "type": "string"} ] }

Schema Evolution Rules

ChangeCompatible?
Add field with default✓ Forward + Backward
Remove field with default✓ Forward + Backward
Add field without defaultForward only
Remove required fieldBackward only
Change field type✗ Breaking

Python Example

import avro.schema from avro.datafile import DataFileWriter from avro.io import DatumWriter schema = avro.schema.parse(open('user.avsc').read()) writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema) writer.append({"id": 1, "name": "Alice", "email": None, "active": True}) writer.close()