Chapter 17
Table Partitioning in Practice
MySQL Table Partitioning Complete Guide
Partitioning divides large tables into smaller, more manageable pieces. This improves query performance, enables efficient archival, and simplifies maintenance for tables with millions of rows.
1. Partitioning Basics
1.1 When to Partition
Benefits:
├─ Faster queries: scan only relevant partitions
├─ Easier maintenance: drop old partitions, not rows
├─ Parallel operations: multiple partitions processed concurrently
├─ Storage efficiency: archive old data separately
└─ Distributed processing: balance load across disks
Challenges:
├─ Complexity: query planning becomes more complex
├─ Limited benefit: small tables don't need partitioning
├─ Overhead: partition pruning checks add cost
└─ Index limitations: some index types not available
RULE OF THUMB:
├─ Table size > 10GB → Consider partitioning
├─ Heavy time-series data → Date-based partitioning
├─ Very high concurrency → Partitioning for parallelism
└─ Frequent large deletes → Partition-based deletion
1.2 Partition Types
RANGE PARTITION (Most Common):
CREATE TABLE orders (
id INT,
order_date DATE,
amount DECIMAL(10,2)
) PARTITION BY RANGE (YEAR(order_date)) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Query: SELECT * FROM orders WHERE order_date >= '2023-01-01'
├─ Only scans: p2023, pmax partitions
├─ Skips: p2020, p2021, p2022 (partition pruning)
└─ Performance: 4 partitions instead of all
LIST PARTITION:
CREATE TABLE users (
id INT,
country VARCHAR(2),
name VARCHAR(100)
) PARTITION BY LIST COLUMNS(country) (
PARTITION p_americas VALUES IN ('US', 'CA', 'MX'),
PARTITION p_europe VALUES IN ('GB', 'DE', 'FR'),
PARTITION p_asia VALUES IN ('CN', 'JP', 'IN'),
PARTITION p_other VALUES IN (DEFAULT)
);
HASH PARTITION:
CREATE TABLE events (
id INT,
event_type VARCHAR(50),
data TEXT
) PARTITION BY HASH (id) PARTITIONS 16;
-- Automatically distributes rows across 16 partitions
KEY PARTITION:
CREATE TABLE logs (
id BIGINT AUTO_INCREMENT,
message TEXT
) PARTITION BY KEY () PARTITIONS 8;
-- Works with AUTO_INCREMENT
2. Partition Maintenance
ADDING PARTITIONS (Range):
-- Add new partition for 2025
ALTER TABLE orders ADD PARTITION (
PARTITION p2024 VALUES LESS THAN (2025)
);
DROPPING PARTITIONS (Delete Data):
-- Drop 2020 data (very fast - no row-by-row delete!)
ALTER TABLE orders DROP PARTITION p2020;
-- Result: Entire partition deleted in milliseconds
-- Traditional DELETE: DELETE FROM orders WHERE YEAR(order_date) = 2020
-- Slow, generates undo logs, blocks writes
PERFORMANCE COMPARISON:
DROP PARTITION:
├─ Time: milliseconds (metadata operation)
├─ Undo log: none
├─ Redo log: minimal
└─ Impact: none
DELETE rows:
├─ Time: minutes for millions of rows
├─ Undo log: full history (for rollback)
├─ Redo log: every row change
└─ Impact: blocks other queries, uses memory
REORGANIZING PARTITIONS:
ALTER TABLE orders REORGANIZE PARTITION p2020, p2021
INTO (
PARTITION p2020_q1 VALUES LESS THAN ('2020-04-01'),
PARTITION p2020_q2 VALUES LESS THAN ('2020-07-01'),
PARTITION p2020_q3 VALUES LESS THAN ('2020-10-01'),
PARTITION p2020_q4 VALUES LESS THAN ('2021-01-01')
);
Note: Creates temporary copies, uses disk space
3. Partition Pruning
Partition pruning: Query optimizer skips irrelevant partitions
STATIC PRUNING (Constants):
SELECT * FROM orders WHERE order_date >= '2023-01-01';
Optimizer analyzes WHERE clause:
├─ order_date >= '2023-01-01'
├─ Which partitions have data >= 2023-01-01?
├─ Answer: p2023, pmax
└─ Scan only 2 partitions (skip 3!)
DYNAMIC PRUNING (Runtime):
SELECT * FROM orders WHERE order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);
Optimizer can't determine at parse time (depends on runtime NOW())
Options:
├─ MySQL 8.0+: Dynamic pruning evaluates at execution time
├─ MySQL 5.7: Doesn't prune (scans all partitions)
└─ Workaround: Rewrite with constants for 5.7
CHECKING PARTITION SELECTION:
EXPLAIN SELECT * FROM orders WHERE order_date >= '2023-01-01'\G
partitions column shows:
├─ NULL: All partitions scanned
├─ p2023, pmax: Only these partitions scanned (good!)
INFORMATION_SCHEMA:
SELECT PARTITION_NAME, PARTITION_EXPRESSION, PARTITION_DESCRIPTION
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'orders';
4. Limitations and Gotchas
UNIQUE CONSTRAINTS:
Unique key must include partition key!
❌ WRONG:
CREATE TABLE orders (
id INT UNIQUE,
order_date DATE
) PARTITION BY RANGE (YEAR(order_date)) (...);
-- Error: Unique constraint without partition key!
✓ CORRECT:
CREATE TABLE orders (
id INT,
order_date DATE,
UNIQUE (id, order_date) -- Includes partition key
) PARTITION BY RANGE (YEAR(order_date)) (...);
FOREIGN KEYS:
❌ Partitioned table can't have foreign keys as parent
❌ Partitioned table can reference foreign key as child (limited support)
Workaround: Use denormalization or handle in application code
FULL-TEXT INDEXES:
❌ Not supported with partitioning
SPATIAL INDEXES:
❌ Not supported with partitioning
5. Real-World Example: Time-Series Data
SCENARIO: Log table, 1 billion rows, 1 year of data
CREATE TABLE access_logs (
id BIGINT AUTO_INCREMENT,
timestamp DATETIME,
user_id INT,
endpoint VARCHAR(255),
response_time INT,
PRIMARY KEY (id, timestamp)
) PARTITION BY RANGE (TO_DAYS(timestamp)) (
PARTITION p_day_1 VALUES LESS THAN (738000), -- Jan 1
PARTITION p_day_2 VALUES LESS THAN (738001), -- Jan 2
... (365 daily partitions)
PARTITION p_max VALUES LESS THAN MAXVALUE
);
BENEFITS:
Daily drops:
-- Very fast - instant metadata operation
ALTER TABLE access_logs DROP PARTITION p_day_1;
Archival:
-- Export old partition to cold storage
ALTER TABLE access_logs EXPORT PARTITION p_day_1;
-- Then drop partition
MAINTENANCE:
Add new partition when new day arrives:
-- Cron job
ALTER TABLE access_logs ADD PARTITION (
PARTITION p_day_366 VALUES LESS THAN (738366)
);
Check partition sizes:
SELECT PARTITION_NAME, PARTITION_EXPRESSION,
ROUND(DATA_LENGTH / 1024 / 1024) AS size_mb
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'access_logs'
ORDER BY PARTITION_ORDINAL_POSITION;
PERFORMANCE:
Query last 30 days:
SELECT * FROM access_logs WHERE timestamp >= '2024-03-24'
-- Scans: ~30 partitions (out of 365)
-- Speed: 30/365 = 8% of full table scan
6. Partition vs Sharding
PARTITIONING (Single Server):
├─ All data in one MySQL instance
├─ Automatic by MySQL (transparent)
├─ Same SQL as non-partitioned
└─ Limited by server resources
SHARDING (Multiple Servers):
├─ Data split across multiple instances
├─ Manual application logic
├─ Different queries per shard
└─ Scales horizontally beyond single server
When partitioning is enough:
├─ Table fits in one server's storage
├─ Read/write QPS handled by one server
└─ Operational complexity acceptable
When sharding needed:
├─ Data > single server capacity
├─ QPS > single server handles
├─ High availability required
└─ Global distribution needed
EXAMPLE:
1 billion rows on single 10TB server:
├─ Partitioning by date: efficient
└─ Single server still handles all reads/writes
1 trillion rows across 1000 servers:
├─ Partitioning alone: insufficient
├─ Sharding required: data across 1000 instances
└─ Consistent hashing: user_id % 1000 = shard
7. Best Practices
- Choose partition key carefully — affects pruning efficiency
- Use date-based partitioning — most common, enables time-series optimization
- Set MAXVALUE partition — catches unexpected values
- Automate partition management — cron job to add/drop partitions
- Monitor partition sizes — ensure balanced distribution
- Test partition pruning — verify EXPLAIN shows partition selection
- Document partition scheme — maintenance burden increases
- Don't partition small tables — overhead exceeds benefits
- Consider before migration — easier than retrofitting
- Use with indexes — partitioning + indexes = best performance
Conclusion
Partitioning is powerful for managing very large tables but not a silver bullet. Use it when tables exceed 10GB, especially for time-series data. Date-based partitioning with daily or monthly partitions typically provides best results. Automate partition lifecycle management to avoid manual overhead.