Chapter 18
Temporary & In-Memory Tables
MySQL Temporary Tables and Memory Tables Optimization
Temporary tables and memory-based storage enable efficient intermediate operations. Understanding their trade-offs is crucial for query optimization and complex data processing.
1. Types of Temporary Storage
1.1 Automatic Temporary Tables
MySQL creates temporary tables internally for operations:
SCENARIO 1: GROUP BY without ORDER BY index
SELECT name, COUNT(*) FROM users GROUP BY name;
├─ MySQL: Creates temporary table to aggregate
├─ Storage: Depends on tmp_table_engine
└─ Cleanup: Automatic (dropped after query)
SCENARIO 2: DISTINCT with non-indexed column
SELECT DISTINCT country FROM users;
├─ MySQL: Creates temporary table to track seen values
└─ Performance: O(n log n) with B-tree or O(n) with hash
MONITORING TEMPORARY TABLE CREATION:
SHOW STATUS LIKE 'Created_tmp_tables';
SHOW STATUS LIKE 'Created_tmp_disk_tables';
If Created_tmp_disk_tables is high:
├─ Many temporary tables created on disk
├─ Performance problem: disk I/O much slower than memory
└─ Solutions: Increase tmp_table_size, optimize queries
CONTROLLING TEMPORARY TABLE ENGINE:
-- What engine creates temporary tables?
SHOW VARIABLES LIKE 'tmp_table_engine';
-- Configure size limits
SHOW VARIABLES LIKE 'tmp_table_size';
SHOW VARIABLES LIKE 'max_heap_table_size';
-- If temporary table exceeds tmp_table_size:
-- MySQL converts from MEMORY to disk-based InnoDB
-- Performance drops significantly!
SET GLOBAL tmp_table_size = 512M; -- Increase for large aggregations
SET GLOBAL max_heap_table_size = 512M;
1.2 Explicit Temporary Tables
CREATE TEMPORARY TABLE temp_results (
id INT,
name VARCHAR(100),
total DECIMAL(10,2)
);
Characteristics:
├─ Visible only to creating session
├─ Dropped automatically when connection closes
├─ Cannot be shared between sessions
├─ Can have indexes, constraints, etc.
└─ Can use any storage engine
EXAMPLE: Complex ETL Process
-- Step 1: Extract
CREATE TEMPORARY TABLE staging_data AS
SELECT u.id, u.name, SUM(o.amount) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > DATE_SUB(NOW(), INTERVAL 1 YEAR)
GROUP BY u.id, u.name;
-- Step 2: Transform
ALTER TABLE staging_data ADD COLUMN segment VARCHAR(50);
UPDATE staging_data
SET segment = CASE
WHEN total_spent > 10000 THEN 'VIP'
WHEN total_spent > 1000 THEN 'Regular'
ELSE 'Bronze'
END;
-- Step 3: Load
INSERT INTO customer_segments
SELECT id, name, segment FROM staging_data;
-- Step 4: Cleanup (automatic when session closes)
BENEFITS:
├─ Intermediate results don't pollute main schema
├─ Multiple sessions can have same temp table name
├─ Can be faster than subqueries
├─ Easier to debug complex logic
└─ Can add indexes for performance
2. MEMORY Engine vs InnoDB
MEMORY (Hash-Based):
CREATE TABLE mem_table (
id INT PRIMARY KEY,
name VARCHAR(100),
balance INT
) ENGINE = MEMORY;
Characteristics:
├─ Store entire table in RAM
├─ Hash indexes (O(1) lookup) or B-tree
├─ No disk persistence (lost on restart)
├─ Good for: small lookup tables, caches
└─ Limitation: Fixed maximum size (max_heap_table_size)
Performance:
├─ Read: Very fast (in-memory access)
├─ Write: Fast but single-threaded
└─ Storage: RAM consumption
InnoDB with Buffer Pool:
Characteristics:
├─ Persists to disk (survives restart)
├─ ACID transactions
├─ Row-level locking
└─ Can exceed memory size (uses disk)
Performance:
├─ Read: Fast if in buffer pool, slow if disk
├─ Write: Medium (redo log, transaction overhead)
└─ Storage: More overhead than MEMORY
CHOOSING ENGINE:
Use MEMORY for:
├─ Small reference tables (<100MB)
├─ Session state/cache
├─ Lookup tables during query execution
└─ Temporary aggregations (guaranteed to fit)
Use InnoDB for:
├─ Large temporary results
├─ Complex queries with multiple steps
├─ When ACID guarantees needed
└─ Long-running operations
3. Tuning Temporary Table Performance
PROBLEM: Query creating huge temporary table on disk
SHOW STATUS LIKE 'Created_tmp_disk_tables';
-- Result: high numbers (performance problem!)
SOLUTION 1: Increase Memory Limit
SET GLOBAL tmp_table_size = 2G;
SET GLOBAL max_heap_table_size = 2G;
Now temporary tables have 2GB budget before spilling to disk
SOLUTION 2: Optimize Query to Avoid Temp Table
❌ SLOW (creates temporary table):
SELECT DISTINCT country FROM users WHERE age > 18;
-- Without index on country, needs temp table
✓ FAST (uses index):
CREATE INDEX idx_age_country ON users(age, country);
SELECT DISTINCT country FROM users WHERE age > 18;
-- Now uses index, no temp table needed
SOLUTION 3: Use Explicit Temp Table with Index
CREATE TEMPORARY TABLE temp_users AS
SELECT * FROM users WHERE age > 18;
CREATE INDEX idx_country ON temp_users(country);
SELECT DISTINCT country FROM temp_users;
-- Explicitly indexes result, then queries
SOLUTION 4: Batch Process
❌ All at once (large temp table):
SELECT name, COUNT(*) FROM users GROUP BY name;
✓ In batches (smaller temp tables):
FOR i = 0; i < 26; i++ DO
SELECT name, COUNT(*) FROM users
WHERE name LIKE CONCAT(CHAR(64+i), '%')
GROUP BY name;
END FOR
-- Process each letter separately, 26 smaller queries
4. Common Use Cases
4.1 Data Deduplication
PROBLEM: Remove duplicates from large table
❌ Slow approach (creates huge temp table):
DELETE FROM events
WHERE id NOT IN (
SELECT MIN(id) FROM events GROUP BY event_key
);
✓ Fast approach (explicit temp table):
CREATE TEMPORARY TABLE temp_dedup AS
SELECT MIN(id) as keep_id
FROM events
GROUP BY event_key;
CREATE INDEX idx_keep ON temp_dedup(keep_id);
DELETE FROM events
WHERE id NOT IN (SELECT keep_id FROM temp_dedup);
4.2 Complex ETL Pipeline
Step 1: Extract
CREATE TEMPORARY TABLE raw_data AS
SELECT * FROM source_table WHERE date >= '2024-01-01';
Step 2: Validate
ALTER TABLE raw_data ADD COLUMN is_valid BOOLEAN DEFAULT TRUE;
UPDATE raw_data SET is_valid = FALSE
WHERE amount < 0 OR amount > 1000000;
Step 3: Transform
CREATE TEMPORARY TABLE transformed_data AS
SELECT
id,
UPPER(name) as name,
ROUND(amount, 2) as amount,
CASE
WHEN amount > 10000 THEN 'high'
WHEN amount > 1000 THEN 'medium'
ELSE 'low'
END as category
FROM raw_data
WHERE is_valid = TRUE;
Step 4: Load
INSERT INTO staging_table
SELECT * FROM transformed_data;
4.3 A/B Testing Analysis
-- Analyze two versions
CREATE TEMPORARY TABLE variant_a AS
SELECT user_id, COUNT(*) as conversions, AVG(revenue) as avg_revenue
FROM events
WHERE variant = 'A'
GROUP BY user_id;
CREATE TEMPORARY TABLE variant_b AS
SELECT user_id, COUNT(*) as conversions, AVG(revenue) as avg_revenue
FROM events
WHERE variant = 'B'
GROUP BY user_id;
-- Statistical comparison
SELECT
'A' as variant,
COUNT(*) as users,
SUM(conversions) as total_conversions,
AVG(avg_revenue) as avg_revenue
FROM variant_a
UNION ALL
SELECT
'B' as variant,
COUNT(*) as users,
SUM(conversions) as total_conversions,
AVG(avg_revenue) as avg_revenue
FROM variant_b;
5. Memory Management
MONITORING RAM USAGE:
-- See current temporary table activity
SELECT * FROM INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO;
-- Monitor creation rate
SHOW GLOBAL STATUS LIKE 'Created_tmp%';
CONFIGURATION:
-- Memory settings
max_heap_table_size (default 16MB)
tmp_table_size (default 16MB)
These limit MEMORY engine table size
If exceeded, converts to disk-based InnoDB
EXAMPLE: Increase for large aggregations
SET GLOBAL max_heap_table_size = 512M;
SET GLOBAL tmp_table_size = 512M;
Then retry query:
SELECT name, COUNT(*) as count
FROM large_table
GROUP BY name;
-- Now aggregation keeps more data in memory
-- Fewer conversions to disk
-- Better performance
6. Best Practices
- Add indexes to temp tables — especially for subsequent queries
- Increase tmp_table_size — if temp tables frequently spill to disk
- Use explicit temp tables — for complex multi-step operations
- Monitor Created_tmp_disk_tables — indicator of performance issues
- Batch large operations — reduce temporary table size
- Use MEMORY for small lookup tables — fast hash-based lookups
- Clean up temp tables explicitly — for long-running connections
- Analyze temp table creation — optimize queries creating large temps
- Test memory configurations — find optimal tmp_table_size for workload
- Document temp table logic — complex queries can be hard to understand
Conclusion
Temporary tables are essential for complex data processing but can become bottlenecks if not managed properly. Keep temporary tables small by optimizing queries, use appropriate storage engines (MEMORY vs InnoDB), and monitor creation rates. Explicit temporary tables with indexes often outperform implicit MySQL-generated temporary tables for complex operations.