Data Format Comparison
| Format | Type | Human Readable | Schema | Best For |
|---|---|---|---|---|
| JSON | Text | ✓ | Optional (JSON Schema) | APIs, config |
| XML | Text | ✓ | XSD, DTD | Enterprise, documents |
| CSV | Text | ✓ | ✗ | Tabular data, Excel |
| YAML | Text | ✓ | Optional | Config files, CI/CD |
| Parquet | Binary | ✗ | ✓ | Big data, analytics |
| Avro | Binary | ✗ | ✓ | Kafka, streaming |
| MessagePack | Binary | ✗ | ✗ | High-performance API |
JSON Text
The universal data interchange format for APIs.
{"name":"Alice","age":28,"active":true}
Pros
- Universal support
- Native JS
Cons
- Verbose
- No comments
CSV Text
Simple tabular format, great for spreadsheets.
name,age,active
Alice,28,true
Bob,34,false
Pros
- Simple
- Excel compatible
Cons
- No nested data
- No types
Parquet Binary
Columnar storage format for big data analytics.
# Read with Python
import pandas as pd
df = pd.read_parquet('data.parquet')
# Write
df.to_parquet('output.parquet')
Pros
- Compressed
- Fast analytics
Cons
- Not human readable
- Needs tooling
Avro Binary
Schema-required binary format, popular in Kafka.
// Avro Schema (JSON)
{
"type": "record",
"name": "User",
"fields": [
{"name": "id", "type": "int"},
{"name": "name", "type": "string"}
]
}
Pros
- Schema evolution
- Compact
Cons
- Complex tooling
- Schema required