Chapter 17

Table Partitioning in Practice

MySQL Table Partitioning Complete Guide

Partitioning divides large tables into smaller, more manageable pieces. This improves query performance, enables efficient archival, and simplifies maintenance for tables with millions of rows.

1. Partitioning Basics

1.1 When to Partition


Benefits:
├─ Faster queries: scan only relevant partitions
├─ Easier maintenance: drop old partitions, not rows
├─ Parallel operations: multiple partitions processed concurrently
├─ Storage efficiency: archive old data separately
└─ Distributed processing: balance load across disks

Challenges:
├─ Complexity: query planning becomes more complex
├─ Limited benefit: small tables don't need partitioning
├─ Overhead: partition pruning checks add cost
└─ Index limitations: some index types not available

RULE OF THUMB:
├─ Table size > 10GB → Consider partitioning
├─ Heavy time-series data → Date-based partitioning
├─ Very high concurrency → Partitioning for parallelism
└─ Frequent large deletes → Partition-based deletion

1.2 Partition Types


RANGE PARTITION (Most Common):

CREATE TABLE orders (
  id INT,
  order_date DATE,
  amount DECIMAL(10,2)
) PARTITION BY RANGE (YEAR(order_date)) (
  PARTITION p2020 VALUES LESS THAN (2021),
  PARTITION p2021 VALUES LESS THAN (2022),
  PARTITION p2022 VALUES LESS THAN (2023),
  PARTITION p2023 VALUES LESS THAN (2024),
  PARTITION pmax VALUES LESS THAN MAXVALUE
);

Query: SELECT * FROM orders WHERE order_date >= '2023-01-01'
├─ Only scans: p2023, pmax partitions
├─ Skips: p2020, p2021, p2022 (partition pruning)
└─ Performance: 4 partitions instead of all

LIST PARTITION:

CREATE TABLE users (
  id INT,
  country VARCHAR(2),
  name VARCHAR(100)
) PARTITION BY LIST COLUMNS(country) (
  PARTITION p_americas VALUES IN ('US', 'CA', 'MX'),
  PARTITION p_europe VALUES IN ('GB', 'DE', 'FR'),
  PARTITION p_asia VALUES IN ('CN', 'JP', 'IN'),
  PARTITION p_other VALUES IN (DEFAULT)
);

HASH PARTITION:

CREATE TABLE events (
  id INT,
  event_type VARCHAR(50),
  data TEXT
) PARTITION BY HASH (id) PARTITIONS 16;
-- Automatically distributes rows across 16 partitions

KEY PARTITION:

CREATE TABLE logs (
  id BIGINT AUTO_INCREMENT,
  message TEXT
) PARTITION BY KEY () PARTITIONS 8;
-- Works with AUTO_INCREMENT

2. Partition Maintenance


ADDING PARTITIONS (Range):

-- Add new partition for 2025
ALTER TABLE orders ADD PARTITION (
  PARTITION p2024 VALUES LESS THAN (2025)
);

DROPPING PARTITIONS (Delete Data):

-- Drop 2020 data (very fast - no row-by-row delete!)
ALTER TABLE orders DROP PARTITION p2020;

-- Result: Entire partition deleted in milliseconds
-- Traditional DELETE: DELETE FROM orders WHERE YEAR(order_date) = 2020
--                    Slow, generates undo logs, blocks writes

PERFORMANCE COMPARISON:

DROP PARTITION:
├─ Time: milliseconds (metadata operation)
├─ Undo log: none
├─ Redo log: minimal
└─ Impact: none

DELETE rows:
├─ Time: minutes for millions of rows
├─ Undo log: full history (for rollback)
├─ Redo log: every row change
└─ Impact: blocks other queries, uses memory

REORGANIZING PARTITIONS:

ALTER TABLE orders REORGANIZE PARTITION p2020, p2021
INTO (
  PARTITION p2020_q1 VALUES LESS THAN ('2020-04-01'),
  PARTITION p2020_q2 VALUES LESS THAN ('2020-07-01'),
  PARTITION p2020_q3 VALUES LESS THAN ('2020-10-01'),
  PARTITION p2020_q4 VALUES LESS THAN ('2021-01-01')
);

Note: Creates temporary copies, uses disk space

3. Partition Pruning


Partition pruning: Query optimizer skips irrelevant partitions

STATIC PRUNING (Constants):

SELECT * FROM orders WHERE order_date >= '2023-01-01';

Optimizer analyzes WHERE clause:
├─ order_date >= '2023-01-01'
├─ Which partitions have data >= 2023-01-01?
├─ Answer: p2023, pmax
└─ Scan only 2 partitions (skip 3!)

DYNAMIC PRUNING (Runtime):

SELECT * FROM orders WHERE order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);

Optimizer can't determine at parse time (depends on runtime NOW())
Options:
├─ MySQL 8.0+: Dynamic pruning evaluates at execution time
├─ MySQL 5.7: Doesn't prune (scans all partitions)
└─ Workaround: Rewrite with constants for 5.7

CHECKING PARTITION SELECTION:

EXPLAIN SELECT * FROM orders WHERE order_date >= '2023-01-01'\G

partitions column shows:
├─ NULL: All partitions scanned
├─ p2023, pmax: Only these partitions scanned (good!)

INFORMATION_SCHEMA:

SELECT PARTITION_NAME, PARTITION_EXPRESSION, PARTITION_DESCRIPTION
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'orders';

4. Limitations and Gotchas


UNIQUE CONSTRAINTS:

Unique key must include partition key!

❌ WRONG:
CREATE TABLE orders (
  id INT UNIQUE,
  order_date DATE
) PARTITION BY RANGE (YEAR(order_date)) (...);
-- Error: Unique constraint without partition key!

✓ CORRECT:
CREATE TABLE orders (
  id INT,
  order_date DATE,
  UNIQUE (id, order_date)  -- Includes partition key
) PARTITION BY RANGE (YEAR(order_date)) (...);

FOREIGN KEYS:

❌ Partitioned table can't have foreign keys as parent
❌ Partitioned table can reference foreign key as child (limited support)

Workaround: Use denormalization or handle in application code

FULL-TEXT INDEXES:

❌ Not supported with partitioning

SPATIAL INDEXES:

❌ Not supported with partitioning

5. Real-World Example: Time-Series Data


SCENARIO: Log table, 1 billion rows, 1 year of data

CREATE TABLE access_logs (
  id BIGINT AUTO_INCREMENT,
  timestamp DATETIME,
  user_id INT,
  endpoint VARCHAR(255),
  response_time INT,
  PRIMARY KEY (id, timestamp)
) PARTITION BY RANGE (TO_DAYS(timestamp)) (
  PARTITION p_day_1 VALUES LESS THAN (738000),  -- Jan 1
  PARTITION p_day_2 VALUES LESS THAN (738001),  -- Jan 2
  ... (365 daily partitions)
  PARTITION p_max VALUES LESS THAN MAXVALUE
);

BENEFITS:

Daily drops:
-- Very fast - instant metadata operation
ALTER TABLE access_logs DROP PARTITION p_day_1;

Archival:
-- Export old partition to cold storage
ALTER TABLE access_logs EXPORT PARTITION p_day_1;
-- Then drop partition

MAINTENANCE:

Add new partition when new day arrives:
-- Cron job
ALTER TABLE access_logs ADD PARTITION (
  PARTITION p_day_366 VALUES LESS THAN (738366)
);

Check partition sizes:
SELECT PARTITION_NAME, PARTITION_EXPRESSION,
       ROUND(DATA_LENGTH / 1024 / 1024) AS size_mb
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'access_logs'
ORDER BY PARTITION_ORDINAL_POSITION;

PERFORMANCE:

Query last 30 days:
SELECT * FROM access_logs WHERE timestamp >= '2024-03-24'
-- Scans: ~30 partitions (out of 365)
-- Speed: 30/365 = 8% of full table scan

6. Partition vs Sharding


PARTITIONING (Single Server):
├─ All data in one MySQL instance
├─ Automatic by MySQL (transparent)
├─ Same SQL as non-partitioned
└─ Limited by server resources

SHARDING (Multiple Servers):
├─ Data split across multiple instances
├─ Manual application logic
├─ Different queries per shard
└─ Scales horizontally beyond single server

When partitioning is enough:
├─ Table fits in one server's storage
├─ Read/write QPS handled by one server
└─ Operational complexity acceptable

When sharding needed:
├─ Data > single server capacity
├─ QPS > single server handles
├─ High availability required
└─ Global distribution needed

EXAMPLE:

1 billion rows on single 10TB server:
├─ Partitioning by date: efficient
└─ Single server still handles all reads/writes

1 trillion rows across 1000 servers:
├─ Partitioning alone: insufficient
├─ Sharding required: data across 1000 instances
└─ Consistent hashing: user_id % 1000 = shard

7. Best Practices

Choose partition key carefully — affects pruning efficiency
Use date-based partitioning — most common, enables time-series optimization
Set MAXVALUE partition — catches unexpected values
Automate partition management — cron job to add/drop partitions
Monitor partition sizes — ensure balanced distribution
Test partition pruning — verify EXPLAIN shows partition selection
Document partition scheme — maintenance burden increases
Don't partition small tables — overhead exceeds benefits
Consider before migration — easier than retrofitting
Use with indexes — partitioning + indexes = best performance

Conclusion

Partitioning is powerful for managing very large tables but not a silver bullet. Use it when tables exceed 10GB, especially for time-series data. Date-based partitioning with daily or monthly partitions typically provides best results. Automate partition lifecycle management to avoid manual overhead.

Rate this chapter

4.9 / 5 (16 ratings)