How to Handle JSON Data in Python
Python json Module Basics
Python's built-in json module is the standard way to handle JSON, requiring no additional dependencies. It provides four core functions: json.loads() (string โ Python object), json.dumps() (Python object โ string), json.load() (file โ Python object), json.dump() (Python object โ file). The s in the function names stands for "string" โ functions with s operate on strings, without s on file objects. Remembering this rule prevents confusion.
Python's JSON type mapping: JSON object ({}) โ Python dict (dict); JSON array ([]) โ Python list (list); JSON string โ Python str; JSON number โ Python int or float; JSON true/false โ Python True/False; JSON null โ Python None.
import json
# ๅญ็ฌฆไธฒ่งฃๆ / String parsing
json_str = '{"name": "Alice", "age": 30, "active": true}'
data = json.loads(json_str)
print(data['name']) # "Alice"
print(data['active']) # True (Python bool, not string)
print(type(data)) #
# ๅบๅๅไธบๅญ็ฌฆไธฒ / Serialize to string
obj = {"name": "Bob", "scores": [95, 87, 92]}
json_str = json.dumps(obj) # ็ดงๅ / Compact
pretty_str = json.dumps(obj, indent=2) # ๆ ผๅผๅ / Pretty
sorted_str = json.dumps(obj, indent=2, sort_keys=True) # ้ฎๆๅบ / Sorted keys
unicode_str = json.dumps(obj, ensure_ascii=False) # ไฟ็ไธญๆ / Preserve CJK
Reading and Writing JSON Files
Reading and writing JSON files in Python is one of the most common operations, using json.load() and json.dump() with the open() context manager:
import json
# ่ฏปๅ JSON ๆไปถ / Reading JSON file
with open('data.json', 'r', encoding='utf-8') as f:
data = json.load(f)
# ๅๅ
ฅ JSON ๆไปถ / Writing JSON file
with open('output.json', 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
# ่ฟฝๅ ๅฐ JSON Lines ๆไปถ๏ผๆฏ่กไธไธช JSON ๅฏน่ฑก๏ผ
# Append to JSON Lines file (one JSON object per line)
with open('data.jsonl', 'a', encoding='utf-8') as f:
f.write(json.dumps(new_record, ensure_ascii=False) + '\n')
# ่ฏปๅ JSON Lines ๆไปถ / Reading JSON Lines file
records = []
with open('data.jsonl', 'r', encoding='utf-8') as f:
for line in f:
if line.strip(): # ่ทณ่ฟ็ฉบ่ก / Skip empty lines
records.append(json.loads(line))
Custom Serialization: Handling Special Types
Python's json module doesn't support serializing datetime, Decimal, UUID, custom classes, and other types by default. You can extend it with a custom JSONEncoder or a default function:
import json
from datetime import datetime, date
from decimal import Decimal
from uuid import UUID
# ๆนๆณ 1๏ผไฝฟ็จ default ๅฝๆฐ / Method 1: Using default function
def extended_encoder(obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
if isinstance(obj, Decimal):
return float(obj)
if isinstance(obj, UUID):
return str(obj)
if hasattr(obj, '__dict__'):
return obj.__dict__
raise TypeError(f'Object of type {type(obj)} is not JSON serializable')
json.dumps(data, default=extended_encoder)
# ๆนๆณ 2๏ผ่ชๅฎไน JSONEncoder / Method 2: Custom JSONEncoder
class ExtendedEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, Decimal):
return str(obj) # ็ฒพ็กฎ้้ข็จๅญ็ฌฆไธฒ / Precise amounts as string
return super().default(obj)
json.dumps(data, cls=ExtendedEncoder, indent=2)
# ่ชๅฎไน่งฃ็ ๏ผobject_hook๏ผ/ Custom decoding (object_hook)
def datetime_decoder(dct):
for key, value in dct.items():
if isinstance(value, str) and 'T' in value:
try:
dct[key] = datetime.fromisoformat(value)
except ValueError:
pass
return dct
data = json.loads(json_str, object_hook=datetime_decoder)
JSON Data Validation with Pydantic
Pydantic is the most popular data validation library in the Python ecosystem. It defines data models through type annotations, automatically validates the types and formats of JSON data, and provides clear error messages. It's widely used in FastAPI:
from pydantic import BaseModel, EmailStr, validator
from typing import Optional, List
from datetime import datetime
class Address(BaseModel):
city: str
country: str
zip_code: Optional[str] = None
class User(BaseModel):
id: int
name: str
email: str
age: int
address: Address
tags: List[str] = []
created_at: datetime
@validator('age')
def validate_age(cls, v):
if v < 0 or v > 150:
raise ValueError('Age must be between 0 and 150')
return v
# ไป JSON ๅญ็ฌฆไธฒ่งฃๆๅนถ้ช่ฏ
# Parse and validate from JSON string
json_str = '''
{
"id": 1,
"name": "Alice",
"email": "[email protected]",
"age": 30,
"address": {"city": "Beijing", "country": "China"},
"created_at": "2025-01-01T00:00:00"
}
'''
user = User.model_validate_json(json_str) # Pydantic v2
# ๆ / or:
# user = User.parse_raw(json_str) # Pydantic v1
# ๅบๅๅๅ JSON
# Serialize back to JSON
print(user.model_dump_json(indent=2))
High-Performance JSON Library: orjson
orjson is Python's highest-performance JSON library, implemented in Rust, more than 10x faster than the standard json module. It also natively supports datetime, UUID, numpy arrays, and other types โ the first choice for processing large amounts of JSON data or high-throughput applications:
# ๅฎ่ฃ
/ Install
# pip install orjson
import orjson
from datetime import datetime
from uuid import UUID
# ๅบๆฌ็จๆณไธ json ๆจกๅ็ฑปไผผ / Basic usage similar to json module
data = {'name': 'Alice', 'created': datetime.now(), 'id': UUID('...')}
# ๅบๅๅ๏ผ่ฟๅ bytes๏ผไธๆฏ str๏ผ
# Serialize (returns bytes, not str)
json_bytes = orjson.dumps(data)
json_bytes = orjson.dumps(data, option=orjson.OPT_INDENT_2) # ๆ ผๅผๅ
# ๅๅบๅๅ๏ผๆฅๅ str ๆ bytes๏ผ
# Deserialize (accepts str or bytes)
data = orjson.loads(json_bytes)
# ๆง่ฝๅฏนๆฏ๏ผๅ็ญๆฐๆฎ๏ผ/ Performance comparison (same data)
# json.dumps: ~1000 ยตs
# orjson.dumps: ~80 ยตs (็บฆ 12 ๅ้ๅบฆ / ~12x faster)
# ujson ไนๆฏๅฟซ้ๆฟไปฃ้้กน
# ujson is also a fast alternative option
# pip install ujson
import ujson
data = ujson.loads(json_str)
output = ujson.dumps(data, ensure_ascii=False, indent=2)
Handling JSON in API Development
Python's most popular API frameworks FastAPI and Flask both have built-in JSON support. FastAPI (recommended for new projects): automatic JSON serialization/deserialization based on Pydantic, automatically generates OpenAPI Schema; request bodies are automatically parsed as Pydantic models, and responses are automatically serialized:
# FastAPI ็คบไพ / FastAPI example
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class UserCreate(BaseModel):
name: str
email: str
class UserResponse(BaseModel):
id: int
name: str
email: str
@app.post('/users', response_model=UserResponse)
async def create_user(user: UserCreate):
# FastAPI ่ชๅจ่งฃๆ่ฏทๆฑไฝไธบ UserCreate ๅฏน่ฑก
# FastAPI automatically parses request body as UserCreate object
new_user = save_to_db(user)
return new_user # ่ชๅจๅบๅๅไธบ JSON
# Flask ็คบไพ / Flask example
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/users', methods=['POST'])
def create_user():
data = request.get_json() # ่งฃๆ่ฏทๆฑ JSON
if not data:
return jsonify({'error': 'Invalid JSON'}), 400
# ๅค็ๆฐๆฎ / Process data
return jsonify({'id': 1, 'name': data['name']}), 201
JSON and Python dataclass Integration
Python 3.7+'s dataclass is a concise way to organize data structures. Through dataclasses.asdict() and dataclasses.fields(), it can integrate with JSON serialization:
For Python 3.10+, you can also use dataclasses.dataclass with __post_init__ for validation, or use newer libraries like dataclass-wizard and cattrs for more complete JSON serialization support, including automatic recursive serialization of nested objects and type checking. These tools are more reliable than manual conversion in terms of performance and functionality, especially for projects with complex data models.
from dataclasses import dataclass, asdict, field
from typing import List
import json
@dataclass
class Address:
city: str
country: str
@dataclass
class User:
id: int
name: str
address: Address
tags: List[str] = field(default_factory=list)
# ๅบๅๅ๏ผdataclass โ dict โ JSON
# Serialize: dataclass โ dict โ JSON
user = User(id=1, name="Alice", address=Address(city="Beijing", country="China"))
user_dict = asdict(user) # ้ๅฝ่ฝฌๆขไธบๅญๅ
ธ
json_str = json.dumps(user_dict, ensure_ascii=False, indent=2)
# ๅๅบๅๅ๏ผJSON โ dict โ dataclass๏ผ้ๆๅจๅค็ๅตๅฅ๏ผ
# Deserialize: JSON โ dict โ dataclass (need to handle nesting manually)
data = json.loads(json_str)
address = Address(**data['address'])
user = User(**{**data, 'address': address})
Try the free tool now
Use Free Tool โ