Chapter 9

Source Code Reading Guide

MySQL Source Code Reading Guide

Understanding MySQL source code enables deep optimization, custom extensions, and expert troubleshooting. This guide provides a structured path through the 2+ million lines of code.

1. Source Code Directory Structure


mysql-server/
├── sql/                    # Core Server Engine (1000+ files)
│   ├── sql_lex.cc         # Lexer (tokenizer)
│   ├── sql_yacc.yy        # Parser grammar
│   ├── sql_select.cc      # SELECT optimization
│   ├── sql_executor.cc    # Query execution
│   ├── handler.cc         # Storage engine interface (critical!)
│   ├── table.cc           # Table representation
│   ├── item.cc            # Expression/column objects
│   ├── join.cc            # JOIN logic
│   ├── lock.cc            # Lock management
│   ├── transaction.cc     # Transaction control
│   ├── filesort.cc        # Sorting (ORDER BY, GROUP BY)
│   └── ... (aggregation, subqueries, triggers, etc.)
│
├── storage/               # Storage Engines
│   ├── innobase/          # InnoDB (~400K lines)
│   │   ├── btr/           # B-tree index implementation
│   │   ├── buf/           # Buffer pool management
│   │   ├── trx/           # Transaction engine
│   │   ├── lock/          # Row/table locking
│   │   ├── dict/          # Data dictionary
│   │   ├── fts/           # Full-text search
│   │   ├── log/           # Redo/undo logs
│   │   └── row/           # Row operations
│   ├── myisam/            # MyISAM engine
│   ├── memory/            # In-memory engine
│   └── csv/, ndb/, etc/   # Other engines
│
├── include/               # Public headers
├── client/                # Client tools (mysql, mysqldump)
├── mysys/                 # System libraries (file I/O, threads)
├── strings/               # String utilities
└── ...

KEY FILES TO STUDY:
- handler.h/cc: Storage engine interface (defines what engines must implement)
- table.h/cc: Represents a table in memory
- item.h/cc: Represents expressions, columns, functions
- sql_select.cc: Query optimization (2000+ lines, core logic)
- sql_executor.cc: Execution engine (3000+ lines)
- join.h/cc: JOIN execution (nested loop, hash join)

2. Understanding Code Organization

2.1 Query Execution Pipeline


Input: SELECT * FROM users WHERE id = 1

PHASE 1: PARSING (sql/)
├─ sql_lex.cc: Tokenize input
├─ sql_yacc.yy: Apply grammar rules
├─ Build parse tree (SELECT_LEX nodes)
└─ Result: AST (Abstract Syntax Tree)

PHASE 2: OPTIMIZATION (sql/optimizer/)
├─ Analyze table statistics
├─ Generate possible execution plans
├─ Calculate cost of each plan
├─ Select best plan
└─ Result: Execution plan

PHASE 3: COMPILATION (sql/)
├─ sql_select.cc: Compile FROM/WHERE/JOIN
├─ item.cc: Compile expressions
└─ Result: Compiled query structure

PHASE 4: EXECUTION (sql/)
├─ sql_executor.cc: Execute plan step by step
├─ handler.cc: Call storage engine for data access
├─ Aggregate/Sort/Limit
└─ Result: Result set sent to client

Code follows these classes/functions:
Query → LEX → SELECT_LEX → Item → handler (storage engine)

2.2 Critical Classes to Understand


// sql/sql_lex.h
class LEX {
  // Represents a single SQL statement
  SELECT_LEX* select_lex;  // Main select
  TABLE_LIST* table_list;  // Tables referenced
};

// sql/sql_select.h
class SELECT_LEX {
  // Represents one SELECT (main or subquery)
  Item* select_list[64];   // Selected columns
  WHERE* where;            // WHERE condition
  ORDER* order;            // ORDER BY
  Item* group_list;        // GROUP BY
};

// sql/table.h
class TABLE {
  // Runtime table object
  char* table_name;        // Table name
  Field* field[MAX_FIELDS];// Column definitions
  KEY* key_info;           // Index definitions
  handler* file;           // Storage engine instance
};

// sql/item.h
class Item {
  // Base class for expressions
  virtual const char* func_name() = 0;
  virtual bool fix_fields(...) = 0;
  virtual longlong val_int() = 0;
  virtual String* val_str(...) = 0;
};

// storage/innobase/include/ha_innodb.h
class ha_innobase : public handler {
  // InnoDB storage engine implementation
  virtual int rnd_init(...) override;  // Table scan init
  virtual int rnd_next(...) override;  // Next row
  virtual int index_read(...) override; // Index lookup
  virtual int write_row(...) override;  // Insert row
};

Level 1: Fundamentals (Start Here)


1. Read sql/handler.h
   - Understand storage engine interface
   - See what methods every engine must implement
   - Time: 2-3 hours

2. Read include/table.h
   - Understand TABLE structure
   - How fields, keys, records are organized
   - Time: 2 hours

3. Study sql/table.cc
   - Table creation, opening, closing
   - Schema management
   - Time: 3 hours

4. Read sql/item.h
   - Expression/column representation
   - Virtual methods for evaluation
   - Time: 2 hours

Level 2: Parser and Optimizer


1. sql_lex.cc / sql_yacc.yy (Parser)
   - How SQL text becomes AST
   - Token definitions
   - Grammar rules
   - Time: 4-5 hours

2. sql_select.cc (Optimization)
   - Table access planning
   - Cost-based optimization
   - JOIN order selection
   - Time: 6-8 hours (complex!)

3. Study with: EXPLAIN output
   - Run query with EXPLAIN FORMAT=JSON
   - Trace code matching execution plan
   - Time: 3 hours

Level 3: Execution Engine


1. sql_executor.cc
   - Query execution loop
   - Nested loop algorithm
   - How JOIN works internally
   - Time: 5 hours

2. Sorting: filesort.cc
   - ORDER BY implementation
   - In-memory vs disk sorting
   - Time: 2 hours

3. Grouping and Aggregation
   - GROUP BY algorithms
   - Aggregate function evaluation
   - Time: 3 hours

Level 4: Storage Engine Deep Dive


InnoDB (storage/innobase/)

Start with:
├─ ha_innobase.h
│  - Storage engine class definition
│  - Required methods overview
│
├─ btr/ (B-tree)
│  - Index structure and operations
│  - Leaf page layout
│
├─ buf/ (Buffer Pool)
│  - Page cache management
│  - Replacement policy (LRU)
│
├─ trx/ (Transactions)
│  - Transaction lifecycle
│  - MVCC implementation (ReadView)
│
└─ lock/ (Locking)
   - Row locks, gap locks, next-key locks
   - Deadlock detection

Recommended approach:
- Run InnoDB queries with innodb_print_all_deadlocks=ON
- Read SHOW ENGINE INNODB STATUS output
- Trace code matching the output
- Time: 10+ hours (very complex!)

4. Debugging MySQL Source


BUILD FROM SOURCE:

# Get source
git clone https://github.com/mysql/mysql-server.git
cd mysql-server
git checkout mysql-8.0.35  # Specific version

# Build with debug symbols
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug \
         -DWITH_BOOST=../boost \
         -DDOWNLOAD_BOOST=ON
make -j4

DEBUGGING WITH GDB:

# Run mysqld in debugger
gdb ./sql/mysqld

(gdb) set args --datadir=/data/mysql --user=mysql

# Add breakpoint in handler.cc
(gdb) break ha_innobase::index_read
(gdb) run

# Client in another terminal
mysql -u root

# In GDB client, execute query - breakpoint will hit
(gdb) where            # See call stack
(gdb) print handler    # Examine variables
(gdb) next             # Step
(gdb) continue         # Resume

5. Key Algorithms and Data Structures


CRITICAL TO UNDERSTAND:

1. B-tree Index
   - Page structure: header + slots + records + free space
   - Leaf node: contains actual rows
   - Internal node: contains keys + child pointers
   - Insertion algorithm: split on overflow
   Location: storage/innobase/btr/

2. Buffer Pool LRU
   - Recently used pages stay in memory
   - Old pages evicted on demand
   - Young pages protected from eviction
   Location: storage/innobase/buf/

3. MVCC (Multi-Version Concurrency Control)
   - ReadView: transaction sees snapshot at start time
   - Undo log: stores previous versions
   - Visible row check: compares row trx_id with ReadView
   Location: storage/innobase/trx/

4. Lock Manager
   - Lock waits form directed graph
   - Deadlock cycle detection: DFS on wait graph
   - If cycle found, pick victim and kill
   Location: storage/innobase/lock/

5. Query Optimization
   - Cost model: io_block_read_cost, row_evaluate_cost
   - Possible plans enumerated, costs compared
   - Best plan selected for execution
   Location: sql/optimizer/

6. Tips for Reading Large Codebases


TOOLS:

1. Source Code Indexing
   - Ctags: ctags -R . (generates TAGS file)
   - Cscope: cscope -R (interactive source browser)
   - clang-tools: clangd (IDE integration)

2. IDE Setup
   - VS Code: C/C++ extension + clangd
   - CLion: JetBrains IDE (commercial)
   - vim/neovim: with coc.nvim or LSP

3. Grep / Ripgrep
   # Find function definition
   grep -rn "handler::index_read" sql/

   # Find function calls
   grep -rn "rnd_next" sql/ | grep -v "virtual"

STRATEGIES:

1. Start at entry point
   - sql_parse_command (sql_parse.cc)
   - Trace execution for specific query type

2. Follow the call stack
   - Use EXPLAIN to understand plan
   - Match code to plan steps

3. Add debug logging
   - Use sql_print_information() to log
   - Rebuild and test

4. Read tests
   - mysql-test/t/*.test
   - Show expected behavior

5. Use Git history
   - git log --follow file.cc
   - git blame to see who changed what

7. Important Files Reference

Component Key Files Lines of Code Difficulty
Parser sql_lex.cc, sql_yacc.yy ~5,000 Medium
Optimizer sql_select.cc, optimizer/ ~15,000 Hard
Executor sql_executor.cc, item.cc ~12,000 Hard
Handler Interface handler.h, handler.cc ~3,000 Easy
InnoDB Core storage/innobase/ ~400,000 Very Hard

8. Resources and Community


OFFICIAL:
- MySQL Developer Zone: dev.mysql.com
- MySQL Documentation: dev.mysql.com/doc/
- GitHub Issues: github.com/mysql/mysql-server/issues

BOOKS:
- "MySQL Internals" (Sasha Pachev) - older but still relevant

BLOGS:
- Percona Labs blog
- MySQL Server Team blog
- Justin Swanhart's blog (query optimization)

CONFERENCES:
- Percona Live (largest MySQL conference)
- Oracle OpenWorld (MySQL track)
- MariaDB Conference

PRACTICE:
- Contribute to MySQL:
  github.com/mysql/mysql-server/blob/8.0/CONTRIBUTING.md
- Review bug reports
- Submit test cases for bugs

Conclusion

Reading MySQL source requires patience but offers deep understanding. Start with handler interface and table structures, gradually move to optimizer and executor, finally explore storage engines. Use EXPLAIN, GDB, and grep tools to navigate the codebase. Join community discussions to accelerate learning.

Rate this chapter
4.7  / 5  (46 ratings)

💬 Comments