Chapter 9
Source Code Reading Guide
MySQL Source Code Reading Guide
Understanding MySQL source code enables deep optimization, custom extensions, and expert troubleshooting. This guide provides a structured path through the 2+ million lines of code.
1. Source Code Directory Structure
mysql-server/
├── sql/ # Core Server Engine (1000+ files)
│ ├── sql_lex.cc # Lexer (tokenizer)
│ ├── sql_yacc.yy # Parser grammar
│ ├── sql_select.cc # SELECT optimization
│ ├── sql_executor.cc # Query execution
│ ├── handler.cc # Storage engine interface (critical!)
│ ├── table.cc # Table representation
│ ├── item.cc # Expression/column objects
│ ├── join.cc # JOIN logic
│ ├── lock.cc # Lock management
│ ├── transaction.cc # Transaction control
│ ├── filesort.cc # Sorting (ORDER BY, GROUP BY)
│ └── ... (aggregation, subqueries, triggers, etc.)
│
├── storage/ # Storage Engines
│ ├── innobase/ # InnoDB (~400K lines)
│ │ ├── btr/ # B-tree index implementation
│ │ ├── buf/ # Buffer pool management
│ │ ├── trx/ # Transaction engine
│ │ ├── lock/ # Row/table locking
│ │ ├── dict/ # Data dictionary
│ │ ├── fts/ # Full-text search
│ │ ├── log/ # Redo/undo logs
│ │ └── row/ # Row operations
│ ├── myisam/ # MyISAM engine
│ ├── memory/ # In-memory engine
│ └── csv/, ndb/, etc/ # Other engines
│
├── include/ # Public headers
├── client/ # Client tools (mysql, mysqldump)
├── mysys/ # System libraries (file I/O, threads)
├── strings/ # String utilities
└── ...
KEY FILES TO STUDY:
- handler.h/cc: Storage engine interface (defines what engines must implement)
- table.h/cc: Represents a table in memory
- item.h/cc: Represents expressions, columns, functions
- sql_select.cc: Query optimization (2000+ lines, core logic)
- sql_executor.cc: Execution engine (3000+ lines)
- join.h/cc: JOIN execution (nested loop, hash join)
2. Understanding Code Organization
2.1 Query Execution Pipeline
Input: SELECT * FROM users WHERE id = 1
PHASE 1: PARSING (sql/)
├─ sql_lex.cc: Tokenize input
├─ sql_yacc.yy: Apply grammar rules
├─ Build parse tree (SELECT_LEX nodes)
└─ Result: AST (Abstract Syntax Tree)
PHASE 2: OPTIMIZATION (sql/optimizer/)
├─ Analyze table statistics
├─ Generate possible execution plans
├─ Calculate cost of each plan
├─ Select best plan
└─ Result: Execution plan
PHASE 3: COMPILATION (sql/)
├─ sql_select.cc: Compile FROM/WHERE/JOIN
├─ item.cc: Compile expressions
└─ Result: Compiled query structure
PHASE 4: EXECUTION (sql/)
├─ sql_executor.cc: Execute plan step by step
├─ handler.cc: Call storage engine for data access
├─ Aggregate/Sort/Limit
└─ Result: Result set sent to client
Code follows these classes/functions:
Query → LEX → SELECT_LEX → Item → handler (storage engine)
2.2 Critical Classes to Understand
// sql/sql_lex.h
class LEX {
// Represents a single SQL statement
SELECT_LEX* select_lex; // Main select
TABLE_LIST* table_list; // Tables referenced
};
// sql/sql_select.h
class SELECT_LEX {
// Represents one SELECT (main or subquery)
Item* select_list[64]; // Selected columns
WHERE* where; // WHERE condition
ORDER* order; // ORDER BY
Item* group_list; // GROUP BY
};
// sql/table.h
class TABLE {
// Runtime table object
char* table_name; // Table name
Field* field[MAX_FIELDS];// Column definitions
KEY* key_info; // Index definitions
handler* file; // Storage engine instance
};
// sql/item.h
class Item {
// Base class for expressions
virtual const char* func_name() = 0;
virtual bool fix_fields(...) = 0;
virtual longlong val_int() = 0;
virtual String* val_str(...) = 0;
};
// storage/innobase/include/ha_innodb.h
class ha_innobase : public handler {
// InnoDB storage engine implementation
virtual int rnd_init(...) override; // Table scan init
virtual int rnd_next(...) override; // Next row
virtual int index_read(...) override; // Index lookup
virtual int write_row(...) override; // Insert row
};
3. Recommended Reading Path
Level 1: Fundamentals (Start Here)
1. Read sql/handler.h
- Understand storage engine interface
- See what methods every engine must implement
- Time: 2-3 hours
2. Read include/table.h
- Understand TABLE structure
- How fields, keys, records are organized
- Time: 2 hours
3. Study sql/table.cc
- Table creation, opening, closing
- Schema management
- Time: 3 hours
4. Read sql/item.h
- Expression/column representation
- Virtual methods for evaluation
- Time: 2 hours
Level 2: Parser and Optimizer
1. sql_lex.cc / sql_yacc.yy (Parser)
- How SQL text becomes AST
- Token definitions
- Grammar rules
- Time: 4-5 hours
2. sql_select.cc (Optimization)
- Table access planning
- Cost-based optimization
- JOIN order selection
- Time: 6-8 hours (complex!)
3. Study with: EXPLAIN output
- Run query with EXPLAIN FORMAT=JSON
- Trace code matching execution plan
- Time: 3 hours
Level 3: Execution Engine
1. sql_executor.cc
- Query execution loop
- Nested loop algorithm
- How JOIN works internally
- Time: 5 hours
2. Sorting: filesort.cc
- ORDER BY implementation
- In-memory vs disk sorting
- Time: 2 hours
3. Grouping and Aggregation
- GROUP BY algorithms
- Aggregate function evaluation
- Time: 3 hours
Level 4: Storage Engine Deep Dive
InnoDB (storage/innobase/)
Start with:
├─ ha_innobase.h
│ - Storage engine class definition
│ - Required methods overview
│
├─ btr/ (B-tree)
│ - Index structure and operations
│ - Leaf page layout
│
├─ buf/ (Buffer Pool)
│ - Page cache management
│ - Replacement policy (LRU)
│
├─ trx/ (Transactions)
│ - Transaction lifecycle
│ - MVCC implementation (ReadView)
│
└─ lock/ (Locking)
- Row locks, gap locks, next-key locks
- Deadlock detection
Recommended approach:
- Run InnoDB queries with innodb_print_all_deadlocks=ON
- Read SHOW ENGINE INNODB STATUS output
- Trace code matching the output
- Time: 10+ hours (very complex!)
4. Debugging MySQL Source
BUILD FROM SOURCE:
# Get source
git clone https://github.com/mysql/mysql-server.git
cd mysql-server
git checkout mysql-8.0.35 # Specific version
# Build with debug symbols
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug \
-DWITH_BOOST=../boost \
-DDOWNLOAD_BOOST=ON
make -j4
DEBUGGING WITH GDB:
# Run mysqld in debugger
gdb ./sql/mysqld
(gdb) set args --datadir=/data/mysql --user=mysql
# Add breakpoint in handler.cc
(gdb) break ha_innobase::index_read
(gdb) run
# Client in another terminal
mysql -u root
# In GDB client, execute query - breakpoint will hit
(gdb) where # See call stack
(gdb) print handler # Examine variables
(gdb) next # Step
(gdb) continue # Resume
5. Key Algorithms and Data Structures
CRITICAL TO UNDERSTAND:
1. B-tree Index
- Page structure: header + slots + records + free space
- Leaf node: contains actual rows
- Internal node: contains keys + child pointers
- Insertion algorithm: split on overflow
Location: storage/innobase/btr/
2. Buffer Pool LRU
- Recently used pages stay in memory
- Old pages evicted on demand
- Young pages protected from eviction
Location: storage/innobase/buf/
3. MVCC (Multi-Version Concurrency Control)
- ReadView: transaction sees snapshot at start time
- Undo log: stores previous versions
- Visible row check: compares row trx_id with ReadView
Location: storage/innobase/trx/
4. Lock Manager
- Lock waits form directed graph
- Deadlock cycle detection: DFS on wait graph
- If cycle found, pick victim and kill
Location: storage/innobase/lock/
5. Query Optimization
- Cost model: io_block_read_cost, row_evaluate_cost
- Possible plans enumerated, costs compared
- Best plan selected for execution
Location: sql/optimizer/
6. Tips for Reading Large Codebases
TOOLS:
1. Source Code Indexing
- Ctags: ctags -R . (generates TAGS file)
- Cscope: cscope -R (interactive source browser)
- clang-tools: clangd (IDE integration)
2. IDE Setup
- VS Code: C/C++ extension + clangd
- CLion: JetBrains IDE (commercial)
- vim/neovim: with coc.nvim or LSP
3. Grep / Ripgrep
# Find function definition
grep -rn "handler::index_read" sql/
# Find function calls
grep -rn "rnd_next" sql/ | grep -v "virtual"
STRATEGIES:
1. Start at entry point
- sql_parse_command (sql_parse.cc)
- Trace execution for specific query type
2. Follow the call stack
- Use EXPLAIN to understand plan
- Match code to plan steps
3. Add debug logging
- Use sql_print_information() to log
- Rebuild and test
4. Read tests
- mysql-test/t/*.test
- Show expected behavior
5. Use Git history
- git log --follow file.cc
- git blame to see who changed what
7. Important Files Reference
| Component | Key Files | Lines of Code | Difficulty |
|---|---|---|---|
| Parser | sql_lex.cc, sql_yacc.yy | ~5,000 | Medium |
| Optimizer | sql_select.cc, optimizer/ | ~15,000 | Hard |
| Executor | sql_executor.cc, item.cc | ~12,000 | Hard |
| Handler Interface | handler.h, handler.cc | ~3,000 | Easy |
| InnoDB Core | storage/innobase/ | ~400,000 | Very Hard |
8. Resources and Community
OFFICIAL:
- MySQL Developer Zone: dev.mysql.com
- MySQL Documentation: dev.mysql.com/doc/
- GitHub Issues: github.com/mysql/mysql-server/issues
BOOKS:
- "MySQL Internals" (Sasha Pachev) - older but still relevant
BLOGS:
- Percona Labs blog
- MySQL Server Team blog
- Justin Swanhart's blog (query optimization)
CONFERENCES:
- Percona Live (largest MySQL conference)
- Oracle OpenWorld (MySQL track)
- MariaDB Conference
PRACTICE:
- Contribute to MySQL:
github.com/mysql/mysql-server/blob/8.0/CONTRIBUTING.md
- Review bug reports
- Submit test cases for bugs
Conclusion
Reading MySQL source requires patience but offers deep understanding. Start with handler interface and table structures, gradually move to optimizer and executor, finally explore storage engines. Use EXPLAIN, GDB, and grep tools to navigate the codebase. Join community discussions to accelerate learning.