Chapter 5

MySQL Architecture Overview

MySQL Execution Architecture and Query Path Deep Dive

Understanding MySQL's internal architecture is essential for optimization and troubleshooting. This guide covers the complete query execution flow from network connection to result delivery, including the Server layer, Query Optimizer, and Storage Engine interface.

1. MySQL Architecture Overview

1.1 Layered Architecture

MySQL uses a three-layer architecture:


┌──────────────────────────────────────────────────────────────┐
│                   CLIENT APPLICATIONS                         │
│           (mysql CLI, JDBC, PDO, Node.js, etc.)              │
└──────────────────────────────────────────────────────────────┘
                          ↓ SQL Query
┌──────────────────────────────────────────────────────────────┐
│                    SERVER LAYER (mysqld)                      │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ Connection Management & Authentication                 │  │
│  │ - TCP/Socket connection handling                       │  │
│  │ - User authentication & privilege checks               │  │
│  │ - Connection pooling & session management              │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ SQL Parser                                              │  │
│  │ - Lexical analysis (tokenization)                      │  │
│  │ - Syntax analysis (parse tree construction)            │  │
│  │ - Query validation                                     │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ Query Optimizer                                         │  │
│  │ - Preprocessing (schema validation, simplification)    │  │
│  │ - Cost analysis (estimating execution plans)           │  │
│  │ - Plan selection (choosing optimal execution strategy) │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ Query Executor                                          │  │
│  │ - Plan execution (following optimization strategy)     │  │
│  │ - Cache management & updates                           │  │
│  │ - Transaction management                               │  │
│  └─────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────┘
                          ↓ Query API
┌──────────────────────────────────────────────────────────────┐
│              STORAGE ENGINE LAYER                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │   InnoDB     │  │     MyISAM    │  │    Memory    │        │
│  │              │  │              │  │              │        │
│  │ Transaction  │  │ Fast reads    │  │ Ultra-fast   │        │
│  │ Support      │  │ Good for logs │  │ Temporary    │        │
│  │ ACID         │  │              │  │              │        │
│  └──────────────┘  └──────────────┘  └──────────────┘        │
└──────────────────────────────────────────────────────────────┘
                          ↓
                    ┌──────────────┐
                    │   Disk I/O   │
                    │   Buffer     │
                    │   Pool       │
                    └──────────────┘

2. Connection Management

2.1 Connection Lifecycle

Each client connection goes through several stages:


1. TCP CONNECTION ESTABLISHMENT
   ↓
2. TCP/UNIX SOCKET HANDSHAKE
   └─ Client: Sends protocol version
   └─ Server: Sends authentication challenge
   ↓
3. AUTHENTICATION
   └─ Client: Sends username, password, database
   └─ Server: Validates against mysql.user table
   └─ Result: Authentication success/failure
   ↓
4. INITIALIZATION
   └─ Load user variables (character_set, sql_mode, etc.)
   └─ Check privileges for default database
   └─ Start new session
   ↓
5. COMMAND PHASE
   ↓
6. TERMINATION
   └─ QUIT command or connection timeout
   └─ Clean up session resources

Connection Timeout Handling:
- interactive_timeout (default 28800s = 8 hours)
- wait_timeout (default 28800s = 8 hours)
- max_connections (default 100, max 100000+)
- max_allowed_packet (default 4MB, limit query size)

      **Monitoring Connection Status:**

-- Show active connections
SHOW PROCESSLIST;

-- Extended process information
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE COMMAND != 'Sleep'
ORDER BY TIME DESC;

-- Connection statistics
SHOW STATUS LIKE 'Threads%';
-- Threads_cached: Number of cached threads
-- Threads_connected: Number of active connections
-- Threads_created: Total threads created since startup
-- Threads_running: Currently executing queries

-- Connection pool configuration
SHOW VARIABLES LIKE 'max_connections';
SHOW VARIABLES LIKE 'max_allowed_packet';
SHOW VARIABLES LIKE 'interactive_timeout';
SHOW VARIABLES LIKE 'wait_timeout';

-- Kill idle or long-running connections
KILL CONNECTION ;
KILL QUERY ;  -- Kill query, not connection

3. SQL Parser

3.1 Parsing Process

The parser converts SQL text into an internal representation.


INPUT: "SELECT * FROM users WHERE age > 18 ORDER BY created_at DESC LIMIT 10"
   ↓ Lexical Analysis (Tokenization)
   └─ Splits into tokens: [SELECT, *, FROM, users, WHERE, age, >, 18, ...]
   ↓ Syntax Analysis (Parsing)
   └─ Validates grammar rules
   └─ Builds Abstract Syntax Tree (AST)
   ├─ SELECT clause (column list, DISTINCT, etc.)
   ├─ FROM clause (tables, joins)
   ├─ WHERE clause (predicates, conditions)
   ├─ GROUP BY clause (grouping)
   ├─ HAVING clause (aggregation filters)
   ├─ ORDER BY clause (sorting)
   └─ LIMIT clause (pagination)
   ↓ Semantic Validation
   └─ Verify table/column existence
   └─ Check user privileges
   └─ Validate function calls
   └─ Type compatibility checks
   ↓ OUTPUT: Parse Tree Ready for Optimization

Common Parser Errors:
- Syntax Error: SELECT * FORM users (typo)
- Unknown Column: SELECT unknown_col FROM users
- Unknown Table: SELECT * FROM nonexistent
- Privilege Denied: Error 1143 (User doesn't have SELECT privilege)

4. Query Optimizer

4.1 Optimization Process

The optimizer transforms the parse tree into an execution plan.


PARSE TREE
   ↓
PREPROCESSING
├─ Schema validation
├─ Remove constant folding (1+1 becomes 2)
├─ Remove unused clauses
├─ Simplify expressions
└─ Resolve table/column references
   ↓
LOGICAL OPTIMIZATION
├─ Predicate pushdown (push WHERE down into JOINs)
├─ Subquery transformation
│  ├─ Convert IN subquery to JOIN if possible
│  ├─ Convert EXISTS to join
│  └─ Flatten derived tables
├─ Remove redundant conditions
├─ Constant expression elimination
└─ Simplification of OR conditions
   ↓
COST ANALYSIS
├─ Table size estimation
├─ Column selectivity analysis
├─ Index availability check
├─ Join order evaluation (for n tables, ~n! combinations)
└─ Storage engine cost calculation
   ↓
PLAN SELECTION
├─ Choose best join order
├─ Select best index for each table
├─ Choose execution strategy:
│  ├─ Full table scan
│  ├─ Index range scan
│  ├─ Index unique lookup
│  ├─ Nested loop join
│  ├─ Hash join (MySQL 8.0.18+)
│  └─ Index merge
└─ Apply optimization hints
   ↓
EXECUTION PLAN
   ├─ Operations in order
   ├─ Access methods per table
   └─ Cost estimation

4.2 Viewing the Optimizer's Plan


-- Basic EXPLAIN
EXPLAIN SELECT u.id, u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.id = 1
GROUP BY u.id, u.name;

-- Extended EXPLAIN (MySQL 5.6+)
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE id = 1;
EXPLAIN FORMAT=TREE SELECT * FROM users WHERE id = 1;  -- MySQL 8.0+

-- Analyze actual execution (MySQL 5.7+)
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1;
-- Shows actual rows examined vs estimated rows

-- Optimizer hints (MySQL 5.7.7+)
SELECT /*+ BKA(t1) */ * FROM t1 INNER JOIN t2 WHERE ...;
SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;
SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM users;

5. Query Execution

5.1 Execution Strategy

The execution engine follows the chosen plan to produce results.


EXECUTION PLAN STAGES:

1. INITIALIZATION
   ├─ Allocate execution context
   ├─ Lock required tables (for writes)
   ├─ Open table handles
   └─ Prepare memory for results

2. TABLE ACCESS (per table in query)
   ├─ Determine access method
   │  ├─ Full table scan (seq scan)
   │  ├─ Index range scan
   │  ├─ Index lookup
   │  └─ Index intersect/union
   ├─ Apply WHERE conditions
   ├─ Apply JOIN predicates
   └─ Fetch matching rows

3. ROW OPERATIONS
   ├─ Apply SELECT expressions
   ├─ Evaluate column functions
   ├─ Type conversions
   └─ Generate result columns

4. RESULT SET BUILDING
   ├─ GROUP BY aggregation (if present)
   ├─ DISTINCT filtering (if present)
   ├─ ORDER BY sorting (if present)
   ├─ LIMIT pagination (if present)
   └─ Build result set buffer

5. TRANSMISSION
   ├─ Serialize results to wire protocol
   ├─ Send to client
   └─ Free memory

6. CLEANUP
   ├─ Unlock tables
   ├─ Free resources
   └─ Log query (if logging enabled)

6. Storage Engine Interface

6.1 Storage Engine Comparison


InnoDB (Default, MySQL 5.5+)
├─ Transactions: Yes (ACID compliant)
├─ Locking: Row-level
├─ Crash Recovery: Yes (via redo logs)
├─ Memory Usage: Higher (buffer pool)
├─ I/O Pattern: Random I/O friendly
├─ Best For: Production, transactional apps
└─ Trade-off: Slightly slower for reads than MyISAM

MyISAM (Legacy, pre-MySQL 5.5)
├─ Transactions: No
├─ Locking: Table-level
├─ Crash Recovery: Limited
├─ Memory Usage: Lower
├─ I/O Pattern: Sequential scan friendly
├─ Best For: Read-heavy, log files, data warehouse
└─ Trade-off: Not suitable for concurrent writes

Memory/HEAP
├─ Transactions: No
├─ Locking: Table-level
├─ Durability: No (lost on shutdown)
├─ Speed: Extremely fast
├─ Best For: Temporary tables, caching
└─ Trade-off: Limited to available RAM

-- Check storage engine for tables
SHOW TABLE STATUS FROM mydb;
SELECT TABLE_NAME, ENGINE FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'mydb';

-- Convert table engine
ALTER TABLE users ENGINE = InnoDB;
ALTER TABLE logs ENGINE = MyISAM;

7. Performance Bottlenecks

7.1 Identifying Bottlenecks


COMMON BOTTLENECKS AND SOLUTIONS:

1. SLOW PARSING
   Solutions:
   ├─ Simplify query structure
   ├─ Use JOIN instead of subqueries
   ├─ Use prepared statements (parse once, execute many)
   └─ Avoid dynamic SQL with too many values

2. BAD EXECUTION PLAN
   Solutions:
   ├─ ANALYZE TABLE to update statistics
   ├─ Create appropriate indexes
   ├─ Use EXPLAIN ANALYZE to verify plan
   ├─ Rewrite query to be optimizer-friendly
   └─ Use optimizer hints if optimizer is wrong

3. LOCK CONTENTION
   Solutions:
   ├─ Shorten transaction duration
   ├─ Reduce lock wait timeout for fast failure
   ├─ Batch updates together
   ├─ Use lower isolation level if appropriate
   └─ Separate hot data to different tables

4. MEMORY PRESSURE
   Solutions:
   ├─ Increase buffer pool size (50-80% of RAM)
   ├─ Optimize query (add indexes)
   ├─ Limit concurrent connections
   └─ Use appropriate LIMIT clause

5. DISK I/O SATURATION
   Solutions:
   ├─ Use SSD instead of HDD
   ├─ Optimize indexes (reduce seeks)
   ├─ Increase buffer pool (reduce disk reads)
   └─ Enable innodb_flush_log_at_trx_commit wisely

6. CPU SATURATION
   Solutions:
   ├─ Move computation out of SQL
   ├─ Use expressions in index (MySQL 5.7+)
   ├─ Use COLLATE BINARY for faster comparisons
   ├─ Batch process if possible
   └─ Scale horizontally (replication, sharding)

8. Query Optimization Workflow

Step-by-step approach to optimize slow queries:


1. IDENTIFY SLOW QUERY
   └─ Check slow_query_log
   └─ Use SHOW PROCESSLIST
   └─ Monitor application logs

2. COLLECT INFORMATION
   └─ EXPLAIN query
   └─ EXPLAIN ANALYZE (MySQL 5.7+)
   └─ Check table sizes
   └─ Review indexes
   └─ Check server load

3. ANALYZE EXECUTION PLAN
   ├─ Full table scan? (type=ALL)
   ├─ Wrong index used?
   ├─ Bad join order?
   ├─ Sorting on disk? (Extra field)
   ├─ Rows examined >> rows returned?
   └─ Cost estimates realistic?

4. IMPLEMENT FIX
   ├─ Create index (most common)
   ├─ ANALYZE TABLE (update statistics)
   ├─ Rewrite query (join order, subquery elimination)
   ├─ Add optimizer hints
   ├─ Partition table (if very large)
   └─ Denormalize (as last resort)

5. VERIFY FIX
   ├─ Run EXPLAIN again
   ├─ Check query time improvement
   ├─ Monitor in production
   └─ Update monitoring thresholds

9. MySQL Architecture in 5.7 vs 8.0

Key architectural improvements in MySQL 8.0:


MySQL 5.7 → MySQL 8.0 Changes:

1. DATA DICTIONARY
   5.7: File-based (.frm, .ibd files)
   8.0: Transactional data dictionary in InnoDB
   ├─ Atomic DDL operations
   └─ Better crash consistency

2. QUERY OPTIMIZER
   5.7: Legacy join order optimization
   8.0: Hypergraph optimizer (more join combinations)
   ├─ Better for complex multi-table joins
   └─ Window function optimization

3. EXECUTION ENGINE
   5.7: Nested loop join only
   8.0: Hash join support (MySQL 8.0.18+)

4. INDEXING
   5.7: Normal indexes only
   8.0: Functional indexes (expressions in index)

5. STATISTICS
   5.7: Single-table statistics
   8.0: Histogram statistics
   ├─ Better handling of skewed data
   └─ ANALYZE TABLE ... HISTOGRAM

Features to leverage in 8.0:
- Functional indexes for complex predicates
- Window functions for ranking/analytics
- Common Table Expressions (CTEs/WITH clause)
- JSON enhancements
- Hypergraph optimizer hints

Conclusion

Understanding MySQL's architecture enables better optimization:

Rate this chapter
4.9  / 5  (77 ratings)

💬 Comments