Data Format Comparison

Format Type Human Readable Schema Best For
JSONTextOptional (JSON Schema)APIs, config
XMLTextXSD, DTDEnterprise, documents
CSVTextTabular data, Excel
YAMLTextOptionalConfig files, CI/CD
ParquetBinaryBig data, analytics
AvroBinaryKafka, streaming
MessagePackBinaryHigh-performance API

JSON Text

The universal data interchange format for APIs.

{"name":"Alice","age":28,"active":true}
Pros
  • Universal support
  • Native JS
Cons
  • Verbose
  • No comments

CSV Text

Simple tabular format, great for spreadsheets.

name,age,active Alice,28,true Bob,34,false
Pros
  • Simple
  • Excel compatible
Cons
  • No nested data
  • No types

Parquet Binary

Columnar storage format for big data analytics.

# Read with Python import pandas as pd df = pd.read_parquet('data.parquet') # Write df.to_parquet('output.parquet')
Pros
  • Compressed
  • Fast analytics
Cons
  • Not human readable
  • Needs tooling

Avro Binary

Schema-required binary format, popular in Kafka.

// Avro Schema (JSON) { "type": "record", "name": "User", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"} ] }
Pros
  • Schema evolution
  • Compact
Cons
  • Complex tooling
  • Schema required