Chapter 49

MySQL Index Optimization Guide

Indexes are the single most impactful lever for MySQL performance. A well-designed index can reduce query time from seconds to milliseconds; a poorly designed one can cripple write throughput. This chapter starts from the physical B+Tree storage structure, dives deep into composite index design methodology, covers 12 common failure scenarios, and includes an interactive index size calculator.

Target Version: This chapter is based on MySQL 8.0 / 8.4 LTS (InnoDB engine). Features specific to 8.0+ (descending indexes, invisible indexes, functional indexes, histograms) are explicitly noted. Differences with 5.7 and earlier are also mentioned.

1. B+Tree Data Structure

1.1 Why B+Tree Over Other Structures?

A database index is fundamentally a data structure problem: given a key, locate the value as fast as possible. Let's compare the candidates:

Data Structure	Lookup	Range Query	Disk I/O	Best For
Hash Table	O(1)	Not supported	Random	Memory engine equality lookups
Binary Search Tree	O(log n) avg, O(n) worst	In-order traversal	One I/O per level	Small in-memory datasets
AVL / Red-Black Tree	O(log n)	In-order traversal	Height = log2n (too tall)	In-memory indexes (Java TreeMap)
B-Tree	O(logmn)	Requires backtracking	One I/O per level, very few levels	Filesystem indexes
B+Tree	O(logmn)	Leaf linked-list scan	2-4 I/O ops	Relational databases
LSM-Tree	O(log n) multi-level	Merge required	Write-optimized	RocksDB / LevelDB
Skip List	O(log n)	Linked-list sequential	Not disk-friendly	Redis ZSET

Why B+Tree wins for databases:

Extremely low tree height: With 16KB pages and ~14-byte index entries per key+pointer, each internal node has a fanout of ~1170. A 3-level B+Tree indexes 1170^2 = ~1.37 million rows. Four levels handles ~1.6 billion rows. Most tables need only 2-4 disk I/O operations to locate any row.
Doubly-linked leaf nodes: B+Tree leaf nodes are connected via forward and backward pointers, enabling efficient range scans (BETWEEN, >, ORDER BY) by simply following the linked list after finding the starting leaf.
Internal nodes store only keys: Unlike B-Trees, B+Tree internal nodes don't store data rows -- only keys and child-page pointers. This means higher fanout, shorter trees, and fewer I/O operations.
All data at leaf level: Every lookup traverses to the leaf level, making path length consistent and performance predictable.

1.2 InnoDB B+Tree Physical Structure

InnoDB uses pages as the minimum I/O unit, defaulting to 16KB (innodb_page_size). The physical structure of a B+Tree index:

                          [Root Page]
                 /            |            \
           [Internal]    [Internal]    [Internal]
          /    |    \   /    |    \   /    |    \
      [Leaf] [Leaf] [Leaf] [Leaf] [Leaf] [Leaf] [Leaf]
        |----->|----->|----->|----->|----->|----->|
        |<-----|<-----|<-----|<-----|<-----|<-----|

      Leaf nodes connected via doubly-linked list (sorted by key)

Each Page internal structure: +--------------------------------------------------+ | Page Header (38 bytes) | | - Page Number, Previous Page, Next Page | | - Page Type, Space ID, LSN | +--------------------------------------------------+ | Infimum Record (virtual minimum record) | | Supremum Record (virtual maximum record) | +--------------------------------------------------+ | User Records (actual index records, sorted) | | Record 1 -> Record 2 -> Record 3 -> ... | | (singly-linked via next_record offset) | +--------------------------------------------------+ | Free Space | +--------------------------------------------------+ | Page Directory (slot array for binary search) | +--------------------------------------------------+ | Page Trailer (8 bytes, checksum) | +--------------------------------------------------+

Intra-page Lookup Process

Within a page, records form a singly-linked list sorted by key. To avoid O(n) traversal, InnoDB maintains a Page Directory -- an array of "slots," each pointing to a record in a group (typically 4-8 records). Lookup first binary-searches the slots, then sequentially scans the small group.

1.3 B+Tree vs B-Tree: Key Differences

Feature	B-Tree	B+Tree (used by InnoDB)
Data storage	All nodes (including internal) store data	Only leaf nodes store data
Leaf linked list	None	Doubly-linked list
Range queries	Requires full in-order traversal	Sequential leaf scan from start point
Internal node fanout	Smaller (stores data too)	Larger (keys + pointers only)
Lookup path length	Variable (may find data mid-tree)	Fixed (always reaches leaf level)
Cache friendliness	Moderate	Better (small internal nodes fit in Buffer Pool)

1.4 Clustered Index

InnoDB stores table data itself organized as a B+Tree keyed by the primary key. This B+Tree -- with complete row data in its leaf nodes -- is the Clustered Index.

Key insight: In InnoDB, the table IS the clustered index and the clustered index IS the table. They are the same data. The .ibd file contains data organized as a primary-key B+Tree.

Primary key selection rules (InnoDB's automatic behavior):

If a PRIMARY KEY is defined, use it
If no PRIMARY KEY, use the first NOT NULL UNIQUE KEY
If neither exists, InnoDB auto-generates a hidden 6-byte DB_ROW_ID column as the clustered index key

Always define an explicit primary key! The hidden DB_ROW_ID is globally auto-incremented (not per-table), creating a global mutex bottleneck under high-concurrency inserts. You also cannot use it in WHERE clauses, wasting the clustered index capability entirely.

1.5 Secondary Index

Every index other than the clustered index is a secondary index (also called non-clustered index). Its B+Tree leaf nodes store the indexed column values + the corresponding row's primary key value.

Clustered Index (PRIMARY KEY = id):

[Root: ..., 50, 100, ...]
     /        |        \

[Leaf: id=1..49] [Leaf: id=50..99] [Leaf: id=100..149] Each leaf stores full rows: id=1, name='Alice', email='alice@...' id=2, name='Bob', email='bob@...'

Secondary Index (INDEX idx_name (name)):

[Root: ..., 'John', 'Mike', ...]
     /          |          \

[Leaf: 'A'..'J'] [Leaf: 'J'..'M'] [Leaf: 'M'..'Z'] Each leaf stores: name='Alice', id=1 name='Bob', id=2 (index columns + primary key)

Bookmark Lookup (Back to Clustered Index)

After finding the primary key via a secondary index, if the query needs columns not in the secondary index, InnoDB must look up the full row in the clustered index. This is called a bookmark lookup (or "back to table" lookup).

-- Assume INDEX idx_name (name)
SELECT id, name, email FROM users WHERE name = 'Alice';

-- Execution:
-- 1. Search idx_name B+Tree for name='Alice' -> get id=1
-- 2. Take id=1 to clustered index (PRIMARY KEY) B+Tree
-- 3. Find full row at leaf: id=1, name='Alice', email='[email protected]'
-- 4. Return result

-- This requires 2 B+Tree lookups (2 random I/O paths)

Cost of bookmark lookups: Each row requires an independent random I/O to the clustered index. If the secondary index scan returns 1000 rows, worst case requires 1000 additional random I/Os. This is why the optimizer may choose a full table scan over an index when selectivity is low -- sequential scanning can be faster than thousands of random I/Os.

1.6 Page Splits and Merges

Page Split

When inserting a record into a full leaf page, InnoDB must split the page into two:

Allocate a new empty page
Move approximately 50% of records to the new page
Update the parent node's pointers and keys
If the parent is also full, recursively split upward (rare)

UUID primary keys cause disaster: Random UUIDs (UUID()) as primary keys insert at random positions, triggering page splits on nearly every insert. Auto-increment keys always append to the last leaf page, causing almost no splits. Under heavy writes, UUID primary keys can be 3-5x slower than auto-increment.

Primary key type comparison:

PK Type	Split Frequency	Space Utilization	Distributed
AUTO_INCREMENT	Very low	~95%	Poor (needs coordination)
UUID v4 (random)	Very high	~50-70%	Good
UUID v7 / ULID (ordered)	Low	~90%	Good
Snowflake ID	Low	~90%	Good

Page Merge

When a page's utilization drops below ~50% (controlled by MERGE_THRESHOLD) due to deletions, InnoDB attempts to merge adjacent pages, freeing empty pages.

-- Observe page merge behavior
SELECT * FROM INFORMATION_SCHEMA.INNODB_METRICS
WHERE NAME LIKE '%index_page_merge%';

-- MySQL 8.0.25+: set merge threshold per index
ALTER TABLE orders DROP INDEX idx_status,
  ADD INDEX idx_status (status) COMMENT 'MERGE_THRESHOLD=30';

1.7 Index Key Storage Format

Index Record Format:

Storage overhead key points:

INT = fixed 4 bytes, BIGINT = fixed 8 bytes -- very compact as index columns
VARCHAR(255) = 1 byte length + actual content bytes. If NULLable, add NULL flag bits
Every secondary index record appends the PK columns, so shorter PKs = smaller secondary indexes
CHAR(255) CHARSET utf8mb4 = fixed 255*4 = 1020 bytes -- catastrophic waste as an index column

2. MySQL Index Types

2.1 PRIMARY KEY

CREATE TABLE users (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

Only one per table
All columns must be NOT NULL
In InnoDB, this IS the clustered index
Automatically unique

2.2 UNIQUE KEY

ALTER TABLE users ADD UNIQUE KEY uk_email (email);

Guarantees uniqueness; allows one NULL (multiple NULLs allowed since MySQL 8.0.16+)
Lookup performance nearly identical to INDEX -- only difference is stopping after first match
Adds uniqueness check overhead on INSERT/UPDATE

2.3 INDEX / KEY (Regular Secondary Index)

ALTER TABLE orders ADD INDEX idx_user_id (user_id);

Most common index type
Allows duplicate values
Supports equality, range, sorting, and grouping

2.4 FULLTEXT Index

ALTER TABLE articles ADD FULLTEXT INDEX ft_content (title, body)
  WITH PARSER ngram; -- needed for CJK text

SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('MySQL optimization' IN BOOLEAN MODE);

InnoDB FULLTEXT uses an inverted index structure
Best for keyword search, not prefix or exact matching
For large-scale search, consider Elasticsearch or Meilisearch

2.5 SPATIAL Index

CREATE TABLE places (
  id BIGINT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(100),
  location POINT NOT NULL SRID 4326,
  SPATIAL INDEX sp_location (location)
) ENGINE=InnoDB;

Uses R-Tree data structure (not B+Tree)
Column must be NOT NULL with SRID declared
InnoDB spatial index support since MySQL 8.0

2.6 Covering Index

Not a separate index type, but a query optimization effect: when all columns needed by a query are contained in the index, no bookmark lookup is needed. See Section 5.

2.7 Prefix Index

-- Index only first 20 characters of VARCHAR(255)
ALTER TABLE users ADD INDEX idx_email_prefix (email(20));

-- Determine optimal prefix length:
SELECT
  COUNT(DISTINCT LEFT(email, 10)) / COUNT(*) AS sel_10,
  COUNT(DISTINCT LEFT(email, 15)) / COUNT(*) AS sel_15,
  COUNT(DISTINCT LEFT(email, 20)) / COUNT(*) AS sel_20,
  COUNT(DISTINCT email) / COUNT(*) AS sel_full
FROM users;

Prefix index limitations: Cannot be used as a covering index (incomplete values), cannot be used for ORDER BY or GROUP BY. Only use when the column is very long and storage is a concern.

2.8 Composite (Multi-Column) Index

ALTER TABLE orders ADD INDEX idx_user_status_created
  (user_id, status, created_at);

Composite indexes are the core of index optimization. Multiple column values are concatenated into a single key in the B+Tree. See Section 3.

2.9 Descending Index -- MySQL 8.0+

-- MySQL 5.7: DESC keyword was parsed but ignored (all stored ASC)
-- MySQL 8.0+: truly supports descending storage

ALTER TABLE events ADD INDEX idx_time_score (created_at DESC, score ASC);

SELECT * FROM events ORDER BY created_at DESC, score ASC LIMIT 20;
-- EXPLAIN Extra: Using index (no filesort)

When do you need descending indexes? Only when ORDER BY has multiple columns with different directions (one ASC, one DESC). If all columns share the same direction, a single-direction index works via forward or backward scanning.

2.10 Invisible Index -- MySQL 8.0+

-- Make index invisible: optimizer ignores it, but index is still maintained
ALTER TABLE orders ALTER INDEX idx_status INVISIBLE;

-- After observing no performance regression, safely drop
ALTER TABLE orders DROP INDEX idx_status;

-- If problems arise, instantly restore
ALTER TABLE orders ALTER INDEX idx_status VISIBLE;

-- Even invisible indexes can be force-tested
SET optimizer_switch = 'use_invisible_indexes=on';

2.11 Functional Index -- MySQL 8.0.13+

-- WHERE YEAR(created_at) = 2026 can't use a regular index
-- Solution: create a functional index
ALTER TABLE orders ADD INDEX idx_year ((YEAR(created_at)));

SELECT * FROM orders WHERE YEAR(created_at) = 2026;
-- EXPLAIN: type=ref, key=idx_year

-- JSON field functional index
ALTER TABLE products ADD INDEX idx_brand
  ((CAST(attributes->>'$.brand' AS CHAR(50) CHARSET utf8mb4)));

Implementation: MySQL 8.0 functional indexes are internally hidden virtual generated columns + regular indexes.

3. Leftmost Prefix Rule

A composite index's B+Tree sorts entries by declared column order sequentially. The index is only usable when the query conditions start from the leftmost column and match continuously. This is the Leftmost Prefix Rule.

Given composite index:

INDEX idx_abc (a, b, c)

B+Tree sort order: First by a; within same a, by b; within same (a,b), by c.

idx_abc (a, b, c) leaf node data (illustration):

| a=1,b=1,c=1 | -> | a=1,b=1,c=3 | -> | a=1,b=2,c=1 | -> | a=1,b=2,c=5 | -> | a=2,b=1,c=2 | -> | a=2,b=1,c=4 | -> | a=2,b=3,c=1 | -> | a=3,b=1,c=1 |

Observation:

Column a is globally sorted: 1, 1, 1, 1, 2, 2, 2, 3
Column b is sorted only within the same a value
Column c is sorted only within the same (a, b) values

3.1 12 Query Examples

All examples assume INDEX idx_abc (a, b, c):

1 WHERE a = 1

Uses index. Matches leftmost column. key_len includes only column a.

EXPLAIN SELECT * FROM t WHERE a = 1;
-- type: ref | key: idx_abc | key_len: 4 | ref: const

2 WHERE a = 1 AND b = 2

Uses index. Matches columns a and b.

EXPLAIN SELECT * FROM t WHERE a = 1 AND b = 2;
-- type: ref | key: idx_abc | key_len: 8 | ref: const,const

3 WHERE a = 1 AND b = 2 AND c = 3

Uses index. All three columns matched.

EXPLAIN SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3;
-- type: ref | key: idx_abc | key_len: 12 | ref: const,const,const

4 WHERE b = 2

Cannot use index. Skipped leftmost column a. Column b is not globally sorted in the index.

EXPLAIN SELECT * FROM t WHERE b = 2;
-- type: ALL | key: NULL (full table scan)

5 WHERE b = 2 AND c = 3

Cannot use index. Missing leftmost column a.

EXPLAIN SELECT * FROM t WHERE b = 2 AND c = 3;
-- type: ALL | key: NULL

6 WHERE a = 1 AND c = 3

Partially uses index. Uses a for filtering. Since b is skipped, c cannot use the index for filtering (c is not sorted within all a=1 rows). However, MySQL 8.0's ICP can filter c at the index level to reduce bookmark lookups.

EXPLAIN SELECT * FROM t WHERE a = 1 AND c = 3;
-- type: ref | key: idx_abc | key_len: 4 (only a used)
-- Extra: Using index condition (ICP filters c at index level)

7 WHERE a = 1 ORDER BY b

Uses index. After filtering a=1, b is already sorted within that range. No filesort needed.

EXPLAIN SELECT * FROM t WHERE a = 1 ORDER BY b;
-- type: ref | key: idx_abc | Extra: no filesort

8 WHERE a = 1 ORDER BY c

Partially uses index. a used for filtering, but c is not sorted within a=1 (because b varies), requiring filesort.

EXPLAIN SELECT * FROM t WHERE a = 1 ORDER BY c;
-- Extra: Using filesort

9 WHERE a = 1 AND b = 2 ORDER BY c

Uses index. After a=1 AND b=2, c is sorted within that subset.

EXPLAIN SELECT * FROM t WHERE a = 1 AND b = 2 ORDER BY c;
-- type: ref | key_len: 8 | Extra: no filesort

10 WHERE a > 1 AND b = 2

Partially uses index. a is used for range scan, but the range condition breaks the index for subsequent column b.

EXPLAIN SELECT * FROM t WHERE a > 1 AND b = 2;
-- type: range | key: idx_abc | key_len: 4 (only a)
-- Extra: Using index condition

Range breakpoint rule: In a composite index, after encountering a range condition (>, <, BETWEEN, LIKE 'xxx%'), all subsequent index columns cannot be used for precise lookups. Equality conditions (=, IN) do NOT break the chain.

11 WHERE a IN (1, 2) AND b = 2 AND c = 3

Uses full index. IN is treated as multiple equality conditions. It does NOT break subsequent columns.

EXPLAIN SELECT * FROM t WHERE a IN (1, 2) AND b = 2 AND c = 3;
-- type: range | key: idx_abc | key_len: 12 (all three columns)

12 WHERE a = 1 AND b > 5 AND c = 3

Partially uses index. a (equality) + b (range) used. b's range breaks c. ICP can filter c at index level.

EXPLAIN SELECT * FROM t WHERE a = 1 AND b > 5 AND c = 3;
-- type: range | key_len: 8 (a + b) | Extra: Using index condition

3.2 Leftmost Prefix Summary Table

WHERE Clause	Index Columns Used	Reason
`a = 1`	a (1 col)	Matches leftmost
`a = 1 AND b = 2`	a, b (2 cols)	Consecutive from left
`a = 1 AND b = 2 AND c = 3`	a, b, c (3 cols)	All matched
`b = 2`	0	Skipped leftmost
`c = 3`	0	Skipped leftmost
`b = 2 AND c = 3`	0	Skipped leftmost
`a = 1 AND c = 3`	a (1) + ICP	b skipped, c via ICP only
`a > 1 AND b = 2`	a (range)	Range breaks subsequent
`a = 1 AND b > 5 AND c = 3`	a, b (range)	b range breaks c
`a IN (1,2) AND b = 2 AND c = 3`	a, b, c (3 cols)	IN does not break

3.3 Column Order Design Principles

Equality columns first (most important)
Higher selectivity columns first (among equality columns)
Range columns last (range breaks subsequent columns)
ORDER BY / GROUP BY columns right after equality columns (avoid filesort)

4. 12 Index Failure Scenarios

Even with indexes created, these 12 common scenarios cause indexes to be unusable (or the optimizer to bypass them), degrading queries to full table scans. Each includes SQL and EXPLAIN output.

Root principle: Index failure boils down to two causes: (1) the query condition cannot match the B+Tree's sort structure; (2) the optimizer determines using the index is less efficient than a full scan. Understanding these two covers all scenarios.

1 Function on indexed column

-- ❌ Index fails: YEAR() function
SELECT * FROM orders WHERE YEAR(created_at) = 2026;
-- EXPLAIN: type=ALL, key=NULL

-- ✅ Rewrite as range
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';
-- EXPLAIN: type=range, key=idx_created

-- ✅ MySQL 8.0.13+: functional index
ALTER TABLE orders ADD INDEX idx_year ((YEAR(created_at)));

2 Implicit type conversion

-- phone is VARCHAR(20) with INDEX idx_phone (phone)

-- ❌ Index fails: integer input, MySQL casts phone to number
SELECT * FROM users WHERE phone = 13800138000;

-- ✅ Pass as string
SELECT * FROM users WHERE phone = '13800138000';
-- EXPLAIN: type=ref, key=idx_phone

MySQL conversion rule: When comparing string column with number, MySQL converts the string to number (not vice versa) -- equivalent to applying CAST on the indexed column.

3 LIKE with leading wildcard

-- ❌ Leading wildcard: no search starting point in B+Tree
SELECT * FROM users WHERE name LIKE '%son';
-- EXPLAIN: type=ALL

-- ✅ Trailing wildcard works
SELECT * FROM users WHERE name LIKE 'John%';
-- EXPLAIN: type=range, key=idx_name

4 OR mixing indexed and non-indexed columns

-- ❌ Index fails
SELECT * FROM users WHERE name = 'John' OR remark = 'VIP';
-- EXPLAIN: type=ALL (remark has no index)

-- ✅ Add index to both, enables Index Merge
ALTER TABLE users ADD INDEX idx_remark (remark);
-- Or rewrite as UNION ALL

5 NOT IN / NOT EXISTS / != / <>

-- ❌ Often full scan (depends on data distribution)
SELECT * FROM orders WHERE status != 'completed';

-- ✅ Rewrite as positive condition
SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');

6 Expression on indexed column

-- ❌ Index fails
SELECT * FROM orders WHERE id + 1 = 10;

-- ✅ Rearrange the expression
SELECT * FROM orders WHERE id = 9;
-- EXPLAIN: type=const, rows=1

7 Charset mismatch in JOIN

-- table_a.user_id: utf8mb4_general_ci
-- table_b.user_id: utf8_general_ci

-- ❌ Index fails: charset mismatch
SELECT a.*, b.name
FROM table_a a JOIN table_b b ON a.user_id = b.user_id;
-- EXPLAIN (table_b): type=ALL

-- ✅ Unify charset
ALTER TABLE table_b MODIFY user_id VARCHAR(50)
  CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

This is one of the most insidious production pitfalls. Tables created at different times often have different charsets. The mismatch only surfaces as performance degrades with growing data.

8 Low cardinality -- optimizer chooses full scan

-- gender column with index, but only 2 distinct values
SELECT * FROM users WHERE gender = 'M';
-- ~50% of 1M rows match -> full scan is faster than 500K bookmark lookups

9 ORDER BY direction mismatch

-- INDEX (a ASC, b ASC) can't satisfy ORDER BY a ASC, b DESC
-- ✅ MySQL 8.0+: create descending index
ALTER TABLE t ADD INDEX idx_a_bdesc (a ASC, b DESC);

10 SELECT * when covering index is available

-- ❌ SELECT * forces bookmark lookup
SELECT * FROM orders WHERE user_id = 100 AND status = 'pending';

-- ✅ Select only indexed columns -> covering index
SELECT user_id, status FROM orders WHERE user_id = 100 AND status = 'pending';
-- Extra: Using index

11 Range condition breaks subsequent index columns

-- INDEX (a, b, c): WHERE a = 1 AND b > 10 AND c = 5
-- Only a and b (range) are used; c cannot narrow the search

-- ✅ Reorder: INDEX (a, c, b) if c equality is more selective

12 Small table -- optimizer decides full scan is faster

-- config table with 20 rows
SELECT * FROM config WHERE key_name = 'max_retries';
-- EXPLAIN: type=ALL, rows=20
-- Correct optimizer decision. No fix needed.

4.1 Failure Scenario Quick Reference

#	Scenario	Root Cause	Fix
1	Function: `YEAR(col)`	B+Tree sorted by raw value	Rewrite as range / functional index
2	Implicit cast: `varchar = 123`	Equivalent to CAST function	Match types
3	`LIKE '%xxx'`	No prefix = no start point	FULLTEXT / ES
4	OR with mixed indexed/non-indexed	Cannot merge scans	Index both / UNION
5	`!=` / `NOT IN`	Low selectivity negation	Rewrite as positive IN
6	Expression: `id+1=10`	Equivalent to function	Rearrange
7	Charset mismatch	Implicit CONVERT	Unify charset/collation
8	Low cardinality	Lookups costlier than scan	Composite / covering index
9	Sort direction mismatch	Mixed ASC/DESC	Descending index (8.0+)
10	`SELECT *`	Cannot cover	Select only needed columns
11	Range breaks subsequent	Post-range cols unsorted	Equality first, range last
12	Small table	Full scan ≤ index lookup	No fix needed

5. Covering Index Deep Dive

5.1 What Is a Covering Index?

When all columns needed by a query (SELECT, WHERE, ORDER BY, GROUP BY) are contained within a single index, InnoDB can return results directly from the index B+Tree without any bookmark lookup to the clustered index.

-- INDEX idx_cover (user_id, status, amount)

SELECT user_id, status, amount FROM orders
WHERE user_id = 100 AND status = 'pending';

-- EXPLAIN Extra: Using index    <-- KEY INDICATOR!

EXPLAIN Extra column meanings:

Using index -- Covering index, no bookmark lookup
Using index condition -- ICP, filtered at index level but still needs bookmark lookup
NULL (empty Extra) -- Used index for positioning, bookmark lookup required

5.2 Performance Benefits

Metric	Non-covering	Covering	Benefit
I/O (1000 row result)	3-4 index + 1000 random bookmark I/Os	3-4 index + sequential leaf scan	~99% fewer I/Os
Data pages accessed	Index pages + many scattered data pages	Index pages only	Higher Buffer Pool hit rate
Data volume read	Full rows (potentially KB each)	Index entries only (bytes each)	90%+ less data read

5.3 Index Condition Pushdown (ICP)

Introduced in MySQL 5.6. Without ICP, the storage engine uses only the index prefix columns to locate records, then passes full rows to the Server layer for further filtering. ICP allows the engine to filter using remaining index columns at the index level, reducing bookmark lookups.

-- INDEX idx_abc (a, b, c)
-- Query: WHERE a = 1 AND c = 5 (skipping b)

-- Without ICP: locate all a=1 rows (10000), bookmark lookup each, filter c=5 at server (100 remain)
-- With ICP: locate all a=1 rows, check c=5 AT INDEX LEVEL, only bookmark lookup 100 rows

5.4 How to Design Covering Indexes

Identify all columns involved: SELECT + WHERE + ORDER BY + GROUP BY
WHERE equality columns first
ORDER BY / GROUP BY columns next
Display-only SELECT columns last
Assess width tradeoff: Too-wide covering indexes increase write overhead and storage

6. Index Merge

Normally MySQL uses only one index per query. But with certain OR/AND conditions, the optimizer can use multiple indexes simultaneously and merge the results.

6.1 Three Index Merge Algorithms

Algorithm	Condition	Operation	EXPLAIN Extra
Intersection	AND with separate indexes	Intersect PK sets	`Using intersect(idx1,idx2)`
Union	OR with separate indexes	Union PK sets	`Using union(idx1,idx2)`
Sort-Union	OR with range conditions	Sort then union	`Using sort_union(idx1,idx2)`

Index Merge Intersection usually means a missing composite index. If you see Using intersect, creating INDEX (a, b) is almost always more efficient than two separate index lookups + merge.

-- Control Index Merge behavior
SET optimizer_switch = 'index_merge_intersection=off';

-- Via hints
SELECT /*+ NO_INDEX_MERGE(t) */ * FROM t WHERE a = 1 AND b = 2;
SELECT /*+ INDEX_MERGE(t idx_a, idx_b) */ * FROM t WHERE a = 1 OR b = 2;

7. Cardinality & Selectivity

7.1 What Is Cardinality?

Cardinality = estimated number of distinct values in an index column. Selectivity = Cardinality / Total Rows. Higher selectivity (closer to 1) = more effective index.

SHOW INDEX FROM orders;
-- user_id cardinality: 523467 / 9.8M rows = 0.053 (moderate)
-- status cardinality: 5 / 9.8M rows = 0.0000005 (very low)

7.2 How InnoDB Estimates Cardinality

InnoDB uses random sampling: it picks innodb_stats_persistent_sample_pages (default 20) random leaf pages, counts distinct values, and extrapolates.

SHOW VARIABLES LIKE 'innodb_stats%';
-- innodb_stats_persistent_sample_pages = 20
-- innodb_stats_auto_recalc = ON  (recalc when >10% DML changes)

7.3 ANALYZE TABLE

-- Trigger statistics recalculation
ANALYZE TABLE orders;
-- Online operation in InnoDB, does not block reads/writes

-- Increase precision for skewed distributions
ALTER TABLE orders STATS_SAMPLE_PAGES = 100;
ANALYZE TABLE orders;

7.4 Histogram Statistics -- MySQL 8.0+

-- Create histogram (no index needed)
ANALYZE TABLE orders UPDATE HISTOGRAM ON status WITH 16 BUCKETS;
ANALYZE TABLE orders UPDATE HISTOGRAM ON total_amount WITH 64 BUCKETS;

-- View histogram info
SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME,
       JSON_EXTRACT(histogram, '$.histogram-type') AS type,
       JSON_EXTRACT(histogram, '$.number-of-buckets-specified') AS buckets
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;

Histogram vs Index:

Aspect	Histogram	Index
Purpose	Help optimizer estimate row counts	Speed up data lookup
Write overhead	None (updated only on ANALYZE)	Every INSERT/UPDATE/DELETE
Storage	Minimal (in data dictionary)	Can be very large

8. Index Design Methodology

8.1 Systematic Design Process

Collect query patterns from slow query log and performance_schema
For each query, identify: equality columns, range columns, JOIN columns, ORDER BY/GROUP BY columns, SELECT columns
Apply the Three-Star System
Merge indexes: if multiple queries can share one composite index via leftmost prefix
Evaluate write overhead
Validate post-deployment with EXPLAIN and monitoring

8.2 Three-Star Index System

From Lahdenmaki & Leach's "Relational Database Index Design and the Optimizers":

Star	Condition	Meaning
First ⭐	Index groups relevant WHERE rows together	Minimize scan range. Equality columns first.
Second ⭐⭐	Index row order matches ORDER BY	Avoid filesort. ORDER BY follows equality cols.
Third ⭐⭐⭐	Index contains all query columns	Covering index. No bookmark lookup.

-- Query:
SELECT user_id, order_date, total_amount
FROM orders WHERE user_id = 100 AND status = 'completed'
ORDER BY order_date DESC LIMIT 20;

-- ★☆☆ One-star: INDEX (user_id)
-- ★★☆ Two-star: INDEX (user_id, status, order_date DESC)
-- ★★★ Three-star: INDEX (user_id, status, order_date DESC, total_amount)

Three stars isn't always optimal. Pursuing the third star (covering index) means wider indexes, more write overhead, and more storage. For write-heavy tables, two stars may be the better tradeoff.

8.3 Index Overhead Analysis

Write Amplification

INSERT: one clustered index write + one write per secondary index
DELETE: each secondary index needs a delete marker
UPDATE of indexed column: effectively DELETE + INSERT per affected index

-- Table with 5 secondary indexes:
-- One INSERT = 1 clustered + 5 secondary = 6 B+Tree insert operations
-- Plus redo log, undo log, doublewrite buffer...
-- Actual amplification can reach 10-20x

8.4 When NOT to Add an Index

Very small tables (<1000 rows): full scan is one I/O
Very low selectivity columns (boolean, gender): unless part of composite
Frequently bulk-updated columns: maintenance cost may exceed query benefit
Write-only tables (audit logs): indexes are pure overhead
Already covered by existing index: INDEX (a,b,c) already covers INDEX (a) and INDEX (a,b)

9. Practical Examples

9.1 E-commerce Orders Table (100M rows)

CREATE TABLE orders (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  user_id BIGINT UNSIGNED NOT NULL,
  shop_id INT UNSIGNED NOT NULL,
  order_no VARCHAR(32) NOT NULL,
  status TINYINT NOT NULL DEFAULT 0,
  total_amount DECIMAL(12,2) NOT NULL,
  pay_time DATETIME DEFAULT NULL,
  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (id),
  UNIQUE KEY uk_order_no (order_no),
  INDEX idx_user_status_created (user_id, status, created_at),
  INDEX idx_shop_created (shop_id, created_at),
  INDEX idx_pay_time (pay_time)
) ENGINE=InnoDB;

Deep Pagination Optimization

-- ❌ Deep pagination disaster
SELECT * FROM orders WHERE user_id = 100
ORDER BY created_at DESC LIMIT 20 OFFSET 100000;

-- ✅ Cursor-based pagination (recommended)
SELECT * FROM orders
WHERE user_id = 100
  AND (created_at, id) < ('2026-01-15 10:30:00', 99850)
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- ✅ Deferred join
SELECT o.* FROM orders o
INNER JOIN (
  SELECT id FROM orders
  WHERE user_id = 100
  ORDER BY created_at DESC
  LIMIT 20 OFFSET 100000
) t ON o.id = t.id;

9.2 User Activity Log (Time-Range Queries)

CREATE TABLE activity_log (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  user_id BIGINT UNSIGNED NOT NULL,
  action VARCHAR(50) NOT NULL,
  target_type VARCHAR(30),
  target_id BIGINT UNSIGNED,
  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (id),
  INDEX idx_user_created (user_id, created_at)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(created_at)) (
  PARTITION p202601 VALUES LESS THAN (TO_DAYS('2026-02-01')),
  PARTITION p202602 VALUES LESS THAN (TO_DAYS('2026-03-01')),
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Feed query (pull model)
SELECT p.id, p.user_id, p.content, p.like_count, p.created_at
FROM posts p
INNER JOIN follows f ON f.following_id = p.user_id
WHERE f.follower_id = 12345
  AND p.created_at > NOW() - INTERVAL 7 DAY
ORDER BY p.created_at DESC
LIMIT 20;

-- Key: follows PRIMARY KEY (follower_id, following_id) is already optimal
-- posts INDEX (user_id, created_at DESC) perfectly matches

10. Index Monitoring

10.1 Finding Unused Indexes

SELECT * FROM sys.schema_unused_indexes
WHERE object_schema = 'your_db';

-- Note: based on performance_schema, needs sufficient uptime to be accurate
-- UNIQUE KEYs serve data integrity even if never queried

10.2 Finding Redundant Indexes

SELECT * FROM sys.schema_redundant_indexes
WHERE table_schema = 'your_db';

-- Example: idx_user_id (user_id) is redundant with idx_user_status (user_id, status)
-- Also: pt-duplicate-key-checker from Percona Toolkit

10.3 Index Usage Statistics

SELECT OBJECT_SCHEMA, OBJECT_NAME, INDEX_NAME,
       COUNT_READ AS reads, COUNT_WRITE AS writes
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = 'your_db' AND INDEX_NAME IS NOT NULL
ORDER BY COUNT_READ DESC;

10.4 Index Fragmentation

-- Check fragmentation
SELECT TABLE_NAME,
  ROUND(DATA_FREE / (DATA_LENGTH + INDEX_LENGTH + DATA_FREE) * 100, 1) AS frag_pct
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'your_db'
ORDER BY DATA_FREE DESC;

-- Rebuild if fragmentation > 20%
ALTER TABLE orders ENGINE=InnoDB; -- online rebuild

11. Index Size Calculator

Estimate storage growth when adding a new index. Enter table row count and index column information to calculate the estimated index size.

  Total table rows





  Primary key type

    INT (4 bytes)
    BIGINT (8 bytes)
    BINARY(16) / UUID (16 bytes)
    VARCHAR(36) UUID string (36 bytes)

Index columns (one per line, format: TYPE or TYPE(length)) BIGINT TINYINT DATETIME

Supported types: INT, BIGINT, TINYINT, SMALLINT, MEDIUMINT, DATETIME, TIMESTAMP, DATE, FLOAT, DOUBLE, DECIMAL, CHAR(n), VARCHAR(n), BINARY(n), VARBINARY(n).

Calculate Index Size

12. Frequently Asked Questions

Q1: What is the maximum number of indexes per table?

InnoDB supports up to 64 secondary indexes (plus 1 clustered index), with a maximum of 16 columns per index. In practice, more than 5-7 indexes per table warrants a review for redundancy. More indexes = slower writes.

Q2: How do I decide the column order in a composite index?

Core principle: Equality columns -> Sort columns -> Range columns. Among equality columns, place higher-selectivity ones first. Range columns always go last because range conditions break subsequent column usage. For covering indexes, append display-only columns at the end.

Q3: Should I use auto-increment ID or UUID for primary keys?

Auto-increment BIGINT for most cases. Sequential inserts avoid page splits, 8-byte fixed length keeps secondary indexes small, and comparison is fastest. For distributed systems needing globally unique IDs, use UUID v7 / ULID / Snowflake -- time-ordered, balancing uniqueness and sequentiality. Avoid random UUID v4 as primary keys.

Q4: How is EXPLAIN key_len calculated?

key_len shows bytes actually used from the index. Rules: INT=4, BIGINT=8, DATETIME=5, TIMESTAMP=4, DATE=3, CHAR(n)=nmax_charset_bytes(utf8mb4=4), VARCHAR(n)=nmax_charset_bytes+2. If NULLable, add +1. By calculating key_len you can determine how many composite index columns were actually used.

Q5: Does adding an index lock the table?

MySQL 8.0 Online DDL supports ALGORITHM=INPLACE, LOCK=NONE for most index operations -- no table lock. Always explicitly specify LOCK=NONE; if unsupported, MySQL will error rather than silently locking. ALTER TABLE orders ADD INDEX idx_new (col1, col2), ALGORITHM=INPLACE, LOCK=NONE;

Q6: Why does EXPLAIN show possible_keys but key is NULL?

The optimizer's cost estimation decided a full scan is cheaper. Causes: (1) low selectivity, (2) inaccurate cardinality -- run ANALYZE TABLE; (3) use FORCE INDEX(idx) for diagnosis only, not in production code.

Q7: How do I handle index fragmentation?

After heavy random INSERT/DELETE, B+Tree page utilization drops. Check: DATA_FREE / (DATA_LENGTH + INDEX_LENGTH + DATA_FREE) from INFORMATION_SCHEMA.TABLES. If >20-30%, rebuild: ALTER TABLE t ENGINE=InnoDB (online) or OPTIMIZE TABLE t. Run during off-peak hours due to I/O impact.

Q8: How do I determine if an index is worth keeping?

Combine: (1) sys.schema_unused_indexes -- is it queried? (2) performance_schema -- read vs write ratio; (3) sys.schema_redundant_indexes -- is it covered by another? Before dropping, set it INVISIBLE for a week (MySQL 8.0+).

Q9: What are the differences between InnoDB and MyISAM indexes?

InnoDB uses a clustered index (data and PK index are one structure); secondary index leaves store PK values. MyISAM indexes are all non-clustered; leaves store physical row pointers. MyISAM secondary lookups don't need a second B+Tree traversal, but MyISAM lacks transactions, row locking, and foreign keys. Modern applications should use InnoDB exclusively.

Q10: What indexing features did MySQL 8.0 introduce?

Key features: (1) Descending indexes -- true DESC storage for mixed-direction ORDER BY; (2) Invisible indexes -- safely test dropping indexes; (3) Functional indexes -- index expressions, solving function-on-column failures; (4) Histogram statistics -- better cost estimation without indexes; (5) Hash Join -- dramatically faster indexless JOINs; (6) Index Skip Scan -- skip the leftmost column in certain scenarios.

Rate this chapter

4.7 / 5 (3 ratings)

MySQL Index Optimization Guide

MySQL Index Optimization Guide

1. B+Tree Data Structure

1.1 Why B+Tree Over Other Structures?

1.2 InnoDB B+Tree Physical Structure

Intra-page Lookup Process

1.3 B+Tree vs B-Tree: Key Differences

1.4 Clustered Index

1.5 Secondary Index

Bookmark Lookup (Back to Clustered Index)

1.6 Page Splits and Merges

Page Split

Page Merge

1.7 Index Key Storage Format

2. MySQL Index Types

2.1 PRIMARY KEY

2.2 UNIQUE KEY

2.3 INDEX / KEY (Regular Secondary Index)

2.4 FULLTEXT Index

2.5 SPATIAL Index

2.6 Covering Index

2.7 Prefix Index

2.8 Composite (Multi-Column) Index

2.9 Descending Index -- MySQL 8.0+

2.10 Invisible Index -- MySQL 8.0+

2.11 Functional Index -- MySQL 8.0.13+

3. Leftmost Prefix Rule

3.1 12 Query Examples

1 WHERE a = 1

2 WHERE a = 1 AND b = 2

3 WHERE a = 1 AND b = 2 AND c = 3

4 WHERE b = 2

5 WHERE b = 2 AND c = 3

6 WHERE a = 1 AND c = 3

7 WHERE a = 1 ORDER BY b

8 WHERE a = 1 ORDER BY c

9 WHERE a = 1 AND b = 2 ORDER BY c

10 WHERE a > 1 AND b = 2

11 WHERE a IN (1, 2) AND b = 2 AND c = 3

12 WHERE a = 1 AND b > 5 AND c = 3

3.2 Leftmost Prefix Summary Table

3.3 Column Order Design Principles

4. 12 Index Failure Scenarios

1 Function on indexed column

2 Implicit type conversion

3 LIKE with leading wildcard

4 OR mixing indexed and non-indexed columns

5 NOT IN / NOT EXISTS / != / <>

6 Expression on indexed column

7 Charset mismatch in JOIN

8 Low cardinality -- optimizer chooses full scan

9 ORDER BY direction mismatch

10 SELECT * when covering index is available

11 Range condition breaks subsequent index columns

12 Small table -- optimizer decides full scan is faster

4.1 Failure Scenario Quick Reference

5. Covering Index Deep Dive

5.1 What Is a Covering Index?

5.2 Performance Benefits

5.3 Index Condition Pushdown (ICP)

5.4 How to Design Covering Indexes

6. Index Merge

6.1 Three Index Merge Algorithms

7. Cardinality & Selectivity

7.1 What Is Cardinality?

7.2 How InnoDB Estimates Cardinality

7.3 ANALYZE TABLE

7.4 Histogram Statistics -- MySQL 8.0+

8. Index Design Methodology

8.1 Systematic Design Process

8.2 Three-Star Index System

8.3 Index Overhead Analysis

Write Amplification

8.4 When NOT to Add an Index

9. Practical Examples

9.1 E-commerce Orders Table (100M rows)

Deep Pagination Optimization

9.2 User Activity Log (Time-Range Queries)

9.3 Social Feed (Complex JOIN + Pagination)

10. Index Monitoring