MySQL Index Optimization Guide
Indexes are the single most impactful lever for MySQL performance. A well-designed index can reduce query time from seconds to milliseconds; a poorly designed one can cripple write throughput. This chapter starts from the physical B+Tree storage structure, dives deep into composite index design methodology, covers 12 common failure scenarios, and includes an interactive index size calculator.
1. B+Tree Data Structure
1.1 Why B+Tree Over Other Structures?
A database index is fundamentally a data structure problem: given a key, locate the value as fast as possible. Let's compare the candidates:
| Data Structure | Lookup | Range Query | Disk I/O | Best For |
|---|---|---|---|---|
| Hash Table | O(1) | Not supported | Random | Memory engine equality lookups |
| Binary Search Tree | O(log n) avg, O(n) worst | In-order traversal | One I/O per level | Small in-memory datasets |
| AVL / Red-Black Tree | O(log n) | In-order traversal | Height = log2n (too tall) | In-memory indexes (Java TreeMap) |
| B-Tree | O(logmn) | Requires backtracking | One I/O per level, very few levels | Filesystem indexes |
| B+Tree | O(logmn) | Leaf linked-list scan | 2-4 I/O ops | Relational databases |
| LSM-Tree | O(log n) multi-level | Merge required | Write-optimized | RocksDB / LevelDB |
| Skip List | O(log n) | Linked-list sequential | Not disk-friendly | Redis ZSET |
Why B+Tree wins for databases:
- Extremely low tree height: With 16KB pages and ~14-byte index entries per key+pointer, each internal node has a fanout of ~1170. A 3-level B+Tree indexes 1170^2 = ~1.37 million rows. Four levels handles ~1.6 billion rows. Most tables need only 2-4 disk I/O operations to locate any row.
- Doubly-linked leaf nodes: B+Tree leaf nodes are connected via forward and backward pointers, enabling efficient range scans (
BETWEEN,>,ORDER BY) by simply following the linked list after finding the starting leaf. - Internal nodes store only keys: Unlike B-Trees, B+Tree internal nodes don't store data rows -- only keys and child-page pointers. This means higher fanout, shorter trees, and fewer I/O operations.
- All data at leaf level: Every lookup traverses to the leaf level, making path length consistent and performance predictable.
1.2 InnoDB B+Tree Physical Structure
InnoDB uses pages as the minimum I/O unit, defaulting to 16KB (innodb_page_size). The physical structure of a B+Tree index:
Intra-page Lookup Process
Within a page, records form a singly-linked list sorted by key. To avoid O(n) traversal, InnoDB maintains a Page Directory -- an array of "slots," each pointing to a record in a group (typically 4-8 records). Lookup first binary-searches the slots, then sequentially scans the small group.
1.3 B+Tree vs B-Tree: Key Differences
| Feature | B-Tree | B+Tree (used by InnoDB) |
|---|---|---|
| Data storage | All nodes (including internal) store data | Only leaf nodes store data |
| Leaf linked list | None | Doubly-linked list |
| Range queries | Requires full in-order traversal | Sequential leaf scan from start point |
| Internal node fanout | Smaller (stores data too) | Larger (keys + pointers only) |
| Lookup path length | Variable (may find data mid-tree) | Fixed (always reaches leaf level) |
| Cache friendliness | Moderate | Better (small internal nodes fit in Buffer Pool) |
1.4 Clustered Index
InnoDB stores table data itself organized as a B+Tree keyed by the primary key. This B+Tree -- with complete row data in its leaf nodes -- is the Clustered Index.
.ibd file contains data organized as a primary-key B+Tree.
Primary key selection rules (InnoDB's automatic behavior):
- If a
PRIMARY KEYis defined, use it - If no PRIMARY KEY, use the first
NOT NULL UNIQUE KEY - If neither exists, InnoDB auto-generates a hidden 6-byte
DB_ROW_IDcolumn as the clustered index key
DB_ROW_ID is globally auto-incremented (not per-table), creating a global mutex bottleneck under high-concurrency inserts. You also cannot use it in WHERE clauses, wasting the clustered index capability entirely.
1.5 Secondary Index
Every index other than the clustered index is a secondary index (also called non-clustered index). Its B+Tree leaf nodes store the indexed column values + the corresponding row's primary key value.
Bookmark Lookup (Back to Clustered Index)
After finding the primary key via a secondary index, if the query needs columns not in the secondary index, InnoDB must look up the full row in the clustered index. This is called a bookmark lookup (or "back to table" lookup).
-- Assume INDEX idx_name (name)
SELECT id, name, email FROM users WHERE name = 'Alice';
-- Execution:
-- 1. Search idx_name B+Tree for name='Alice' -> get id=1
-- 2. Take id=1 to clustered index (PRIMARY KEY) B+Tree
-- 3. Find full row at leaf: id=1, name='Alice', email='[email protected]'
-- 4. Return result
-- This requires 2 B+Tree lookups (2 random I/O paths)
1.6 Page Splits and Merges
Page Split
When inserting a record into a full leaf page, InnoDB must split the page into two:
- Allocate a new empty page
- Move approximately 50% of records to the new page
- Update the parent node's pointers and keys
- If the parent is also full, recursively split upward (rare)
UUID()) as primary keys insert at random positions, triggering page splits on nearly every insert. Auto-increment keys always append to the last leaf page, causing almost no splits. Under heavy writes, UUID primary keys can be 3-5x slower than auto-increment.
Primary key type comparison:
| PK Type | Split Frequency | Space Utilization | Distributed |
|---|---|---|---|
| AUTO_INCREMENT | Very low | ~95% | Poor (needs coordination) |
| UUID v4 (random) | Very high | ~50-70% | Good |
| UUID v7 / ULID (ordered) | Low | ~90% | Good |
| Snowflake ID | Low | ~90% | Good |
Page Merge
When a page's utilization drops below ~50% (controlled by MERGE_THRESHOLD) due to deletions, InnoDB attempts to merge adjacent pages, freeing empty pages.
-- Observe page merge behavior
SELECT * FROM INFORMATION_SCHEMA.INNODB_METRICS
WHERE NAME LIKE '%index_page_merge%';
-- MySQL 8.0.25+: set merge threshold per index
ALTER TABLE orders DROP INDEX idx_status,
ADD INDEX idx_status (status) COMMENT 'MERGE_THRESHOLD=30';
1.7 Index Key Storage Format
Storage overhead key points:
INT= fixed 4 bytes,BIGINT= fixed 8 bytes -- very compact as index columnsVARCHAR(255)= 1 byte length + actual content bytes. If NULLable, add NULL flag bits- Every secondary index record appends the PK columns, so shorter PKs = smaller secondary indexes
CHAR(255) CHARSET utf8mb4= fixed 255*4 = 1020 bytes -- catastrophic waste as an index column
2. MySQL Index Types
2.1 PRIMARY KEY
CREATE TABLE users (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
) ENGINE=InnoDB;
- Only one per table
- All columns must be
NOT NULL - In InnoDB, this IS the clustered index
- Automatically unique
2.2 UNIQUE KEY
ALTER TABLE users ADD UNIQUE KEY uk_email (email);
- Guarantees uniqueness; allows one NULL (multiple NULLs allowed since MySQL 8.0.16+)
- Lookup performance nearly identical to INDEX -- only difference is stopping after first match
- Adds uniqueness check overhead on INSERT/UPDATE
2.3 INDEX / KEY (Regular Secondary Index)
ALTER TABLE orders ADD INDEX idx_user_id (user_id);
- Most common index type
- Allows duplicate values
- Supports equality, range, sorting, and grouping
2.4 FULLTEXT Index
ALTER TABLE articles ADD FULLTEXT INDEX ft_content (title, body)
WITH PARSER ngram; -- needed for CJK text
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('MySQL optimization' IN BOOLEAN MODE);
- InnoDB FULLTEXT uses an inverted index structure
- Best for keyword search, not prefix or exact matching
- For large-scale search, consider Elasticsearch or Meilisearch
2.5 SPATIAL Index
CREATE TABLE places (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
location POINT NOT NULL SRID 4326,
SPATIAL INDEX sp_location (location)
) ENGINE=InnoDB;
- Uses R-Tree data structure (not B+Tree)
- Column must be
NOT NULLwith SRID declared - InnoDB spatial index support since MySQL 8.0
2.6 Covering Index
Not a separate index type, but a query optimization effect: when all columns needed by a query are contained in the index, no bookmark lookup is needed. See Section 5.
2.7 Prefix Index
-- Index only first 20 characters of VARCHAR(255)
ALTER TABLE users ADD INDEX idx_email_prefix (email(20));
-- Determine optimal prefix length:
SELECT
COUNT(DISTINCT LEFT(email, 10)) / COUNT(*) AS sel_10,
COUNT(DISTINCT LEFT(email, 15)) / COUNT(*) AS sel_15,
COUNT(DISTINCT LEFT(email, 20)) / COUNT(*) AS sel_20,
COUNT(DISTINCT email) / COUNT(*) AS sel_full
FROM users;
2.8 Composite (Multi-Column) Index
ALTER TABLE orders ADD INDEX idx_user_status_created
(user_id, status, created_at);
Composite indexes are the core of index optimization. Multiple column values are concatenated into a single key in the B+Tree. See Section 3.
2.9 Descending Index -- MySQL 8.0+
-- MySQL 5.7: DESC keyword was parsed but ignored (all stored ASC)
-- MySQL 8.0+: truly supports descending storage
ALTER TABLE events ADD INDEX idx_time_score (created_at DESC, score ASC);
SELECT * FROM events ORDER BY created_at DESC, score ASC LIMIT 20;
-- EXPLAIN Extra: Using index (no filesort)
2.10 Invisible Index -- MySQL 8.0+
-- Make index invisible: optimizer ignores it, but index is still maintained
ALTER TABLE orders ALTER INDEX idx_status INVISIBLE;
-- After observing no performance regression, safely drop
ALTER TABLE orders DROP INDEX idx_status;
-- If problems arise, instantly restore
ALTER TABLE orders ALTER INDEX idx_status VISIBLE;
-- Even invisible indexes can be force-tested
SET optimizer_switch = 'use_invisible_indexes=on';
2.11 Functional Index -- MySQL 8.0.13+
-- WHERE YEAR(created_at) = 2026 can't use a regular index
-- Solution: create a functional index
ALTER TABLE orders ADD INDEX idx_year ((YEAR(created_at)));
SELECT * FROM orders WHERE YEAR(created_at) = 2026;
-- EXPLAIN: type=ref, key=idx_year
-- JSON field functional index
ALTER TABLE products ADD INDEX idx_brand
((CAST(attributes->>'$.brand' AS CHAR(50) CHARSET utf8mb4)));
3. Leftmost Prefix Rule
A composite index's B+Tree sorts entries by declared column order sequentially. The index is only usable when the query conditions start from the leftmost column and match continuously. This is the Leftmost Prefix Rule.
Given composite index:
INDEX idx_abc (a, b, c)
B+Tree sort order: First by a; within same a, by b; within same (a,b), by c.
3.1 12 Query Examples
All examples assume INDEX idx_abc (a, b, c):
1 WHERE a = 1
Uses index. Matches leftmost column. key_len includes only column a.
EXPLAIN SELECT * FROM t WHERE a = 1;
-- type: ref | key: idx_abc | key_len: 4 | ref: const
2 WHERE a = 1 AND b = 2
Uses index. Matches columns a and b.
EXPLAIN SELECT * FROM t WHERE a = 1 AND b = 2;
-- type: ref | key: idx_abc | key_len: 8 | ref: const,const
3 WHERE a = 1 AND b = 2 AND c = 3
Uses index. All three columns matched.
EXPLAIN SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3;
-- type: ref | key: idx_abc | key_len: 12 | ref: const,const,const
4 WHERE b = 2
Cannot use index. Skipped leftmost column a. Column b is not globally sorted in the index.
EXPLAIN SELECT * FROM t WHERE b = 2;
-- type: ALL | key: NULL (full table scan)
5 WHERE b = 2 AND c = 3
Cannot use index. Missing leftmost column a.
EXPLAIN SELECT * FROM t WHERE b = 2 AND c = 3;
-- type: ALL | key: NULL
6 WHERE a = 1 AND c = 3
Partially uses index. Uses a for filtering. Since b is skipped, c cannot use the index for filtering (c is not sorted within all a=1 rows). However, MySQL 8.0's ICP can filter c at the index level to reduce bookmark lookups.
EXPLAIN SELECT * FROM t WHERE a = 1 AND c = 3;
-- type: ref | key: idx_abc | key_len: 4 (only a used)
-- Extra: Using index condition (ICP filters c at index level)
7 WHERE a = 1 ORDER BY b
Uses index. After filtering a=1, b is already sorted within that range. No filesort needed.
EXPLAIN SELECT * FROM t WHERE a = 1 ORDER BY b;
-- type: ref | key: idx_abc | Extra: no filesort
8 WHERE a = 1 ORDER BY c
Partially uses index. a used for filtering, but c is not sorted within a=1 (because b varies), requiring filesort.
EXPLAIN SELECT * FROM t WHERE a = 1 ORDER BY c;
-- Extra: Using filesort
9 WHERE a = 1 AND b = 2 ORDER BY c
Uses index. After a=1 AND b=2, c is sorted within that subset.
EXPLAIN SELECT * FROM t WHERE a = 1 AND b = 2 ORDER BY c;
-- type: ref | key_len: 8 | Extra: no filesort
10 WHERE a > 1 AND b = 2
Partially uses index. a is used for range scan, but the range condition breaks the index for subsequent column b.
EXPLAIN SELECT * FROM t WHERE a > 1 AND b = 2;
-- type: range | key: idx_abc | key_len: 4 (only a)
-- Extra: Using index condition
>, <, BETWEEN, LIKE 'xxx%'), all subsequent index columns cannot be used for precise lookups. Equality conditions (=, IN) do NOT break the chain.
11 WHERE a IN (1, 2) AND b = 2 AND c = 3
Uses full index. IN is treated as multiple equality conditions. It does NOT break subsequent columns.
EXPLAIN SELECT * FROM t WHERE a IN (1, 2) AND b = 2 AND c = 3;
-- type: range | key: idx_abc | key_len: 12 (all three columns)
12 WHERE a = 1 AND b > 5 AND c = 3
Partially uses index. a (equality) + b (range) used. b's range breaks c. ICP can filter c at index level.
EXPLAIN SELECT * FROM t WHERE a = 1 AND b > 5 AND c = 3;
-- type: range | key_len: 8 (a + b) | Extra: Using index condition
3.2 Leftmost Prefix Summary Table
| WHERE Clause | Index Columns Used | Reason |
|---|---|---|
a = 1 | a (1 col) | Matches leftmost |
a = 1 AND b = 2 | a, b (2 cols) | Consecutive from left |
a = 1 AND b = 2 AND c = 3 | a, b, c (3 cols) | All matched |
b = 2 | 0 | Skipped leftmost |
c = 3 | 0 | Skipped leftmost |
b = 2 AND c = 3 | 0 | Skipped leftmost |
a = 1 AND c = 3 | a (1) + ICP | b skipped, c via ICP only |
a > 1 AND b = 2 | a (range) | Range breaks subsequent |
a = 1 AND b > 5 AND c = 3 | a, b (range) | b range breaks c |
a IN (1,2) AND b = 2 AND c = 3 | a, b, c (3 cols) | IN does not break |
3.3 Column Order Design Principles
- Equality columns first (most important)
- Higher selectivity columns first (among equality columns)
- Range columns last (range breaks subsequent columns)
- ORDER BY / GROUP BY columns right after equality columns (avoid filesort)
4. 12 Index Failure Scenarios
Even with indexes created, these 12 common scenarios cause indexes to be unusable (or the optimizer to bypass them), degrading queries to full table scans. Each includes SQL and EXPLAIN output.
1 Function on indexed column
-- โ Index fails: YEAR() function
SELECT * FROM orders WHERE YEAR(created_at) = 2026;
-- EXPLAIN: type=ALL, key=NULL
-- โ
Rewrite as range
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';
-- EXPLAIN: type=range, key=idx_created
-- โ
MySQL 8.0.13+: functional index
ALTER TABLE orders ADD INDEX idx_year ((YEAR(created_at)));
2 Implicit type conversion
-- phone is VARCHAR(20) with INDEX idx_phone (phone)
-- โ Index fails: integer input, MySQL casts phone to number
SELECT * FROM users WHERE phone = 13800138000;
-- โ
Pass as string
SELECT * FROM users WHERE phone = '13800138000';
-- EXPLAIN: type=ref, key=idx_phone
3 LIKE with leading wildcard
-- โ Leading wildcard: no search starting point in B+Tree
SELECT * FROM users WHERE name LIKE '%son';
-- EXPLAIN: type=ALL
-- โ
Trailing wildcard works
SELECT * FROM users WHERE name LIKE 'John%';
-- EXPLAIN: type=range, key=idx_name
4 OR mixing indexed and non-indexed columns
-- โ Index fails
SELECT * FROM users WHERE name = 'John' OR remark = 'VIP';
-- EXPLAIN: type=ALL (remark has no index)
-- โ
Add index to both, enables Index Merge
ALTER TABLE users ADD INDEX idx_remark (remark);
-- Or rewrite as UNION ALL
5 NOT IN / NOT EXISTS / != / <>
-- โ Often full scan (depends on data distribution)
SELECT * FROM orders WHERE status != 'completed';
-- โ
Rewrite as positive condition
SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');
6 Expression on indexed column
-- โ Index fails
SELECT * FROM orders WHERE id + 1 = 10;
-- โ
Rearrange the expression
SELECT * FROM orders WHERE id = 9;
-- EXPLAIN: type=const, rows=1
7 Charset mismatch in JOIN
-- table_a.user_id: utf8mb4_general_ci
-- table_b.user_id: utf8_general_ci
-- โ Index fails: charset mismatch
SELECT a.*, b.name
FROM table_a a JOIN table_b b ON a.user_id = b.user_id;
-- EXPLAIN (table_b): type=ALL
-- โ
Unify charset
ALTER TABLE table_b MODIFY user_id VARCHAR(50)
CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
8 Low cardinality -- optimizer chooses full scan
-- gender column with index, but only 2 distinct values
SELECT * FROM users WHERE gender = 'M';
-- ~50% of 1M rows match -> full scan is faster than 500K bookmark lookups
9 ORDER BY direction mismatch
-- INDEX (a ASC, b ASC) can't satisfy ORDER BY a ASC, b DESC
-- โ
MySQL 8.0+: create descending index
ALTER TABLE t ADD INDEX idx_a_bdesc (a ASC, b DESC);
10 SELECT * when covering index is available
-- โ SELECT * forces bookmark lookup
SELECT * FROM orders WHERE user_id = 100 AND status = 'pending';
-- โ
Select only indexed columns -> covering index
SELECT user_id, status FROM orders WHERE user_id = 100 AND status = 'pending';
-- Extra: Using index
11 Range condition breaks subsequent index columns
-- INDEX (a, b, c): WHERE a = 1 AND b > 10 AND c = 5
-- Only a and b (range) are used; c cannot narrow the search
-- โ
Reorder: INDEX (a, c, b) if c equality is more selective
12 Small table -- optimizer decides full scan is faster
-- config table with 20 rows
SELECT * FROM config WHERE key_name = 'max_retries';
-- EXPLAIN: type=ALL, rows=20
-- Correct optimizer decision. No fix needed.
4.1 Failure Scenario Quick Reference
| # | Scenario | Root Cause | Fix |
|---|---|---|---|
| 1 | Function: YEAR(col) | B+Tree sorted by raw value | Rewrite as range / functional index |
| 2 | Implicit cast: varchar = 123 | Equivalent to CAST function | Match types |
| 3 | LIKE '%xxx' | No prefix = no start point | FULLTEXT / ES |
| 4 | OR with mixed indexed/non-indexed | Cannot merge scans | Index both / UNION |
| 5 | != / NOT IN | Low selectivity negation | Rewrite as positive IN |
| 6 | Expression: id+1=10 | Equivalent to function | Rearrange |
| 7 | Charset mismatch | Implicit CONVERT | Unify charset/collation |
| 8 | Low cardinality | Lookups costlier than scan | Composite / covering index |
| 9 | Sort direction mismatch | Mixed ASC/DESC | Descending index (8.0+) |
| 10 | SELECT * | Cannot cover | Select only needed columns |
| 11 | Range breaks subsequent | Post-range cols unsorted | Equality first, range last |
| 12 | Small table | Full scan ≤ index lookup | No fix needed |
5. Covering Index Deep Dive
5.1 What Is a Covering Index?
When all columns needed by a query (SELECT, WHERE, ORDER BY, GROUP BY) are contained within a single index, InnoDB can return results directly from the index B+Tree without any bookmark lookup to the clustered index.
-- INDEX idx_cover (user_id, status, amount)
SELECT user_id, status, amount FROM orders
WHERE user_id = 100 AND status = 'pending';
-- EXPLAIN Extra: Using index <-- KEY INDICATOR!
EXPLAIN Extra column meanings:
Using index-- Covering index, no bookmark lookupUsing index condition-- ICP, filtered at index level but still needs bookmark lookupNULL(empty Extra) -- Used index for positioning, bookmark lookup required
5.2 Performance Benefits
| Metric | Non-covering | Covering | Benefit |
|---|---|---|---|
| I/O (1000 row result) | 3-4 index + 1000 random bookmark I/Os | 3-4 index + sequential leaf scan | ~99% fewer I/Os |
| Data pages accessed | Index pages + many scattered data pages | Index pages only | Higher Buffer Pool hit rate |
| Data volume read | Full rows (potentially KB each) | Index entries only (bytes each) | 90%+ less data read |
5.3 Index Condition Pushdown (ICP)
Introduced in MySQL 5.6. Without ICP, the storage engine uses only the index prefix columns to locate records, then passes full rows to the Server layer for further filtering. ICP allows the engine to filter using remaining index columns at the index level, reducing bookmark lookups.
-- INDEX idx_abc (a, b, c)
-- Query: WHERE a = 1 AND c = 5 (skipping b)
-- Without ICP: locate all a=1 rows (10000), bookmark lookup each, filter c=5 at server (100 remain)
-- With ICP: locate all a=1 rows, check c=5 AT INDEX LEVEL, only bookmark lookup 100 rows
5.4 How to Design Covering Indexes
- Identify all columns involved: SELECT + WHERE + ORDER BY + GROUP BY
- WHERE equality columns first
- ORDER BY / GROUP BY columns next
- Display-only SELECT columns last
- Assess width tradeoff: Too-wide covering indexes increase write overhead and storage
6. Index Merge
Normally MySQL uses only one index per query. But with certain OR/AND conditions, the optimizer can use multiple indexes simultaneously and merge the results.
6.1 Three Index Merge Algorithms
| Algorithm | Condition | Operation | EXPLAIN Extra |
|---|---|---|---|
| Intersection | AND with separate indexes | Intersect PK sets | Using intersect(idx1,idx2) |
| Union | OR with separate indexes | Union PK sets | Using union(idx1,idx2) |
| Sort-Union | OR with range conditions | Sort then union | Using sort_union(idx1,idx2) |
Using intersect, creating INDEX (a, b) is almost always more efficient than two separate index lookups + merge.
-- Control Index Merge behavior
SET optimizer_switch = 'index_merge_intersection=off';
-- Via hints
SELECT /*+ NO_INDEX_MERGE(t) */ * FROM t WHERE a = 1 AND b = 2;
SELECT /*+ INDEX_MERGE(t idx_a, idx_b) */ * FROM t WHERE a = 1 OR b = 2;
7. Cardinality & Selectivity
7.1 What Is Cardinality?
Cardinality = estimated number of distinct values in an index column. Selectivity = Cardinality / Total Rows. Higher selectivity (closer to 1) = more effective index.
SHOW INDEX FROM orders;
-- user_id cardinality: 523467 / 9.8M rows = 0.053 (moderate)
-- status cardinality: 5 / 9.8M rows = 0.0000005 (very low)
7.2 How InnoDB Estimates Cardinality
InnoDB uses random sampling: it picks innodb_stats_persistent_sample_pages (default 20) random leaf pages, counts distinct values, and extrapolates.
SHOW VARIABLES LIKE 'innodb_stats%';
-- innodb_stats_persistent_sample_pages = 20
-- innodb_stats_auto_recalc = ON (recalc when >10% DML changes)
7.3 ANALYZE TABLE
-- Trigger statistics recalculation
ANALYZE TABLE orders;
-- Online operation in InnoDB, does not block reads/writes
-- Increase precision for skewed distributions
ALTER TABLE orders STATS_SAMPLE_PAGES = 100;
ANALYZE TABLE orders;
7.4 Histogram Statistics -- MySQL 8.0+
-- Create histogram (no index needed)
ANALYZE TABLE orders UPDATE HISTOGRAM ON status WITH 16 BUCKETS;
ANALYZE TABLE orders UPDATE HISTOGRAM ON total_amount WITH 64 BUCKETS;
-- View histogram info
SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME,
JSON_EXTRACT(histogram, '$.histogram-type') AS type,
JSON_EXTRACT(histogram, '$.number-of-buckets-specified') AS buckets
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;
Histogram vs Index:
| Aspect | Histogram | Index |
|---|---|---|
| Purpose | Help optimizer estimate row counts | Speed up data lookup |
| Write overhead | None (updated only on ANALYZE) | Every INSERT/UPDATE/DELETE |
| Storage | Minimal (in data dictionary) | Can be very large |
8. Index Design Methodology
8.1 Systematic Design Process
- Collect query patterns from slow query log and
performance_schema - For each query, identify: equality columns, range columns, JOIN columns, ORDER BY/GROUP BY columns, SELECT columns
- Apply the Three-Star System
- Merge indexes: if multiple queries can share one composite index via leftmost prefix
- Evaluate write overhead
- Validate post-deployment with EXPLAIN and monitoring
8.2 Three-Star Index System
From Lahdenmaki & Leach's "Relational Database Index Design and the Optimizers":
| Star | Condition | Meaning |
|---|---|---|
| First ⭐ | Index groups relevant WHERE rows together | Minimize scan range. Equality columns first. |
| Second ⭐⭐ | Index row order matches ORDER BY | Avoid filesort. ORDER BY follows equality cols. |
| Third ⭐⭐⭐ | Index contains all query columns | Covering index. No bookmark lookup. |
-- Query:
SELECT user_id, order_date, total_amount
FROM orders WHERE user_id = 100 AND status = 'completed'
ORDER BY order_date DESC LIMIT 20;
-- โ
โโ One-star: INDEX (user_id)
-- โ
โ
โ Two-star: INDEX (user_id, status, order_date DESC)
-- โ
โ
โ
Three-star: INDEX (user_id, status, order_date DESC, total_amount)
8.3 Index Overhead Analysis
Write Amplification
- INSERT: one clustered index write + one write per secondary index
- DELETE: each secondary index needs a delete marker
- UPDATE of indexed column: effectively DELETE + INSERT per affected index
-- Table with 5 secondary indexes:
-- One INSERT = 1 clustered + 5 secondary = 6 B+Tree insert operations
-- Plus redo log, undo log, doublewrite buffer...
-- Actual amplification can reach 10-20x
8.4 When NOT to Add an Index
- Very small tables (<1000 rows): full scan is one I/O
- Very low selectivity columns (boolean, gender): unless part of composite
- Frequently bulk-updated columns: maintenance cost may exceed query benefit
- Write-only tables (audit logs): indexes are pure overhead
- Already covered by existing index:
INDEX (a,b,c)already coversINDEX (a)andINDEX (a,b)
9. Practical Examples
9.1 E-commerce Orders Table (100M rows)
CREATE TABLE orders (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
user_id BIGINT UNSIGNED NOT NULL,
shop_id INT UNSIGNED NOT NULL,
order_no VARCHAR(32) NOT NULL,
status TINYINT NOT NULL DEFAULT 0,
total_amount DECIMAL(12,2) NOT NULL,
pay_time DATETIME DEFAULT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id),
UNIQUE KEY uk_order_no (order_no),
INDEX idx_user_status_created (user_id, status, created_at),
INDEX idx_shop_created (shop_id, created_at),
INDEX idx_pay_time (pay_time)
) ENGINE=InnoDB;
Deep Pagination Optimization
-- โ Deep pagination disaster
SELECT * FROM orders WHERE user_id = 100
ORDER BY created_at DESC LIMIT 20 OFFSET 100000;
-- โ
Cursor-based pagination (recommended)
SELECT * FROM orders
WHERE user_id = 100
AND (created_at, id) < ('2026-01-15 10:30:00', 99850)
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- โ
Deferred join
SELECT o.* FROM orders o
INNER JOIN (
SELECT id FROM orders
WHERE user_id = 100
ORDER BY created_at DESC
LIMIT 20 OFFSET 100000
) t ON o.id = t.id;
9.2 User Activity Log (Time-Range Queries)
CREATE TABLE activity_log (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
user_id BIGINT UNSIGNED NOT NULL,
action VARCHAR(50) NOT NULL,
target_type VARCHAR(30),
target_id BIGINT UNSIGNED,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
INDEX idx_user_created (user_id, created_at)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(created_at)) (
PARTITION p202601 VALUES LESS THAN (TO_DAYS('2026-02-01')),
PARTITION p202602 VALUES LESS THAN (TO_DAYS('2026-03-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
9.3 Social Feed (Complex JOIN + Pagination)
-- Feed query (pull model)
SELECT p.id, p.user_id, p.content, p.like_count, p.created_at
FROM posts p
INNER JOIN follows f ON f.following_id = p.user_id
WHERE f.follower_id = 12345
AND p.created_at > NOW() - INTERVAL 7 DAY
ORDER BY p.created_at DESC
LIMIT 20;
-- Key: follows PRIMARY KEY (follower_id, following_id) is already optimal
-- posts INDEX (user_id, created_at DESC) perfectly matches
10. Index Monitoring
10.1 Finding Unused Indexes
SELECT * FROM sys.schema_unused_indexes
WHERE object_schema = 'your_db';
-- Note: based on performance_schema, needs sufficient uptime to be accurate
-- UNIQUE KEYs serve data integrity even if never queried
10.2 Finding Redundant Indexes
SELECT * FROM sys.schema_redundant_indexes
WHERE table_schema = 'your_db';
-- Example: idx_user_id (user_id) is redundant with idx_user_status (user_id, status)
-- Also: pt-duplicate-key-checker from Percona Toolkit
10.3 Index Usage Statistics
SELECT OBJECT_SCHEMA, OBJECT_NAME, INDEX_NAME,
COUNT_READ AS reads, COUNT_WRITE AS writes
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = 'your_db' AND INDEX_NAME IS NOT NULL
ORDER BY COUNT_READ DESC;
10.4 Index Fragmentation
-- Check fragmentation
SELECT TABLE_NAME,
ROUND(DATA_FREE / (DATA_LENGTH + INDEX_LENGTH + DATA_FREE) * 100, 1) AS frag_pct
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'your_db'
ORDER BY DATA_FREE DESC;
-- Rebuild if fragmentation > 20%
ALTER TABLE orders ENGINE=InnoDB; -- online rebuild
11. Index Size Calculator
Estimate storage growth when adding a new index. Enter table row count and index column information to calculate the estimated index size.
12. Frequently Asked Questions
Q1: What is the maximum number of indexes per table?
InnoDB supports up to 64 secondary indexes (plus 1 clustered index), with a maximum of 16 columns per index. In practice, more than 5-7 indexes per table warrants a review for redundancy. More indexes = slower writes.
Q2: How do I decide the column order in a composite index?
Core principle: Equality columns -> Sort columns -> Range columns. Among equality columns, place higher-selectivity ones first. Range columns always go last because range conditions break subsequent column usage. For covering indexes, append display-only columns at the end.
Q3: Should I use auto-increment ID or UUID for primary keys?
Auto-increment BIGINT for most cases. Sequential inserts avoid page splits, 8-byte fixed length keeps secondary indexes small, and comparison is fastest. For distributed systems needing globally unique IDs, use UUID v7 / ULID / Snowflake -- time-ordered, balancing uniqueness and sequentiality. Avoid random UUID v4 as primary keys.
Q4: How is EXPLAIN key_len calculated?
key_len shows bytes actually used from the index. Rules: INT=4, BIGINT=8, DATETIME=5, TIMESTAMP=4, DATE=3, CHAR(n)=n*max_charset_bytes(utf8mb4=4), VARCHAR(n)=n*max_charset_bytes+2. If NULLable, add +1. By calculating key_len you can determine how many composite index columns were actually used.
Q5: Does adding an index lock the table?
MySQL 8.0 Online DDL supports ALGORITHM=INPLACE, LOCK=NONE for most index operations -- no table lock. Always explicitly specify LOCK=NONE; if unsupported, MySQL will error rather than silently locking.
ALTER TABLE orders ADD INDEX idx_new (col1, col2), ALGORITHM=INPLACE, LOCK=NONE;
Q6: Why does EXPLAIN show possible_keys but key is NULL?
The optimizer's cost estimation decided a full scan is cheaper. Causes: (1) low selectivity, (2) inaccurate cardinality -- run ANALYZE TABLE; (3) use FORCE INDEX(idx) for diagnosis only, not in production code.
Q7: How do I handle index fragmentation?
After heavy random INSERT/DELETE, B+Tree page utilization drops. Check: DATA_FREE / (DATA_LENGTH + INDEX_LENGTH + DATA_FREE) from INFORMATION_SCHEMA.TABLES. If >20-30%, rebuild: ALTER TABLE t ENGINE=InnoDB (online) or OPTIMIZE TABLE t. Run during off-peak hours due to I/O impact.
Q8: How do I determine if an index is worth keeping?
Combine: (1) sys.schema_unused_indexes -- is it queried? (2) performance_schema -- read vs write ratio; (3) sys.schema_redundant_indexes -- is it covered by another? Before dropping, set it INVISIBLE for a week (MySQL 8.0+).
Q9: What are the differences between InnoDB and MyISAM indexes?
InnoDB uses a clustered index (data and PK index are one structure); secondary index leaves store PK values. MyISAM indexes are all non-clustered; leaves store physical row pointers. MyISAM secondary lookups don't need a second B+Tree traversal, but MyISAM lacks transactions, row locking, and foreign keys. Modern applications should use InnoDB exclusively.
Q10: What indexing features did MySQL 8.0 introduce?
Key features: (1) Descending indexes -- true DESC storage for mixed-direction ORDER BY; (2) Invisible indexes -- safely test dropping indexes; (3) Functional indexes -- index expressions, solving function-on-column failures; (4) Histogram statistics -- better cost estimation without indexes; (5) Hash Join -- dramatically faster indexless JOINs; (6) Index Skip Scan -- skip the leftmost column in certain scenarios.