Cassandra Guide
Data Modeling Rules
- Design tables around your queries, not relationships
- One table per query pattern (denormalization is OK)
- Partition key distributes data across nodes
- Clustering key sorts data within a partition
- Avoid large partitions (> 100MB or 100k rows)
CQL Examples
-- Create keyspace
CREATE KEYSPACE my_app
WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 3};
-- Create table (query: get user's recent posts by date)
CREATE TABLE posts_by_user (
user_id UUID,
created_at TIMESTAMP,
post_id UUID,
title TEXT,
content TEXT,
tags SET<TEXT>,
metadata MAP<TEXT, TEXT>,
PRIMARY KEY ((user_id), created_at, post_id) -- (partition, clustering...)
) WITH CLUSTERING ORDER BY (created_at DESC);
-- Insert
INSERT INTO posts_by_user (user_id, created_at, post_id, title)
VALUES (uuid(), toTimestamp(now()), uuid(), 'Hello World')
USING TTL 2592000; -- 30 days TTL
-- Query (must include full partition key)
SELECT * FROM posts_by_user
WHERE user_id = ? AND created_at > '2024-01-01'
LIMIT 20;
Cassandra vs MongoDB vs DynamoDB
| Cassandra | MongoDB | DynamoDB | |
|---|---|---|---|
| Best for | High-write time series | Flexible documents | AWS serverless |
| Query flexibility | Low (partition key required) | High | Medium |
| Write throughput | Excellent | Good | Excellent (managed) |
| ACID | Lightweight transactions | Multi-doc transactions | Single-item ACID |