Chapter 18

Temporary & In-Memory Tables

MySQL Temporary Tables and Memory Tables Optimization

Temporary tables and memory-based storage enable efficient intermediate operations. Understanding their trade-offs is crucial for query optimization and complex data processing.

1. Types of Temporary Storage

1.1 Automatic Temporary Tables


MySQL creates temporary tables internally for operations:

SCENARIO 1: GROUP BY without ORDER BY index
SELECT name, COUNT(*) FROM users GROUP BY name;
├─ MySQL: Creates temporary table to aggregate
├─ Storage: Depends on tmp_table_engine
└─ Cleanup: Automatic (dropped after query)

SCENARIO 2: DISTINCT with non-indexed column
SELECT DISTINCT country FROM users;
├─ MySQL: Creates temporary table to track seen values
└─ Performance: O(n log n) with B-tree or O(n) with hash

MONITORING TEMPORARY TABLE CREATION:

SHOW STATUS LIKE 'Created_tmp_tables';
SHOW STATUS LIKE 'Created_tmp_disk_tables';

If Created_tmp_disk_tables is high:
├─ Many temporary tables created on disk
├─ Performance problem: disk I/O much slower than memory
└─ Solutions: Increase tmp_table_size, optimize queries

CONTROLLING TEMPORARY TABLE ENGINE:

-- What engine creates temporary tables?
SHOW VARIABLES LIKE 'tmp_table_engine';

-- Configure size limits
SHOW VARIABLES LIKE 'tmp_table_size';
SHOW VARIABLES LIKE 'max_heap_table_size';

-- If temporary table exceeds tmp_table_size:
-- MySQL converts from MEMORY to disk-based InnoDB
-- Performance drops significantly!

SET GLOBAL tmp_table_size = 512M;  -- Increase for large aggregations
SET GLOBAL max_heap_table_size = 512M;

1.2 Explicit Temporary Tables


CREATE TEMPORARY TABLE temp_results (
  id INT,
  name VARCHAR(100),
  total DECIMAL(10,2)
);

Characteristics:
├─ Visible only to creating session
├─ Dropped automatically when connection closes
├─ Cannot be shared between sessions
├─ Can have indexes, constraints, etc.
└─ Can use any storage engine

EXAMPLE: Complex ETL Process

-- Step 1: Extract
CREATE TEMPORARY TABLE staging_data AS
SELECT u.id, u.name, SUM(o.amount) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > DATE_SUB(NOW(), INTERVAL 1 YEAR)
GROUP BY u.id, u.name;

-- Step 2: Transform
ALTER TABLE staging_data ADD COLUMN segment VARCHAR(50);
UPDATE staging_data
SET segment = CASE
  WHEN total_spent > 10000 THEN 'VIP'
  WHEN total_spent > 1000 THEN 'Regular'
  ELSE 'Bronze'
END;

-- Step 3: Load
INSERT INTO customer_segments
SELECT id, name, segment FROM staging_data;

-- Step 4: Cleanup (automatic when session closes)

BENEFITS:
├─ Intermediate results don't pollute main schema
├─ Multiple sessions can have same temp table name
├─ Can be faster than subqueries
├─ Easier to debug complex logic
└─ Can add indexes for performance

2. MEMORY Engine vs InnoDB


MEMORY (Hash-Based):

CREATE TABLE mem_table (
  id INT PRIMARY KEY,
  name VARCHAR(100),
  balance INT
) ENGINE = MEMORY;

Characteristics:
├─ Store entire table in RAM
├─ Hash indexes (O(1) lookup) or B-tree
├─ No disk persistence (lost on restart)
├─ Good for: small lookup tables, caches
└─ Limitation: Fixed maximum size (max_heap_table_size)

Performance:
├─ Read: Very fast (in-memory access)
├─ Write: Fast but single-threaded
└─ Storage: RAM consumption

InnoDB with Buffer Pool:

Characteristics:
├─ Persists to disk (survives restart)
├─ ACID transactions
├─ Row-level locking
└─ Can exceed memory size (uses disk)

Performance:
├─ Read: Fast if in buffer pool, slow if disk
├─ Write: Medium (redo log, transaction overhead)
└─ Storage: More overhead than MEMORY

CHOOSING ENGINE:

Use MEMORY for:
├─ Small reference tables (<100MB)
├─ Session state/cache
├─ Lookup tables during query execution
└─ Temporary aggregations (guaranteed to fit)

Use InnoDB for:
├─ Large temporary results
├─ Complex queries with multiple steps
├─ When ACID guarantees needed
└─ Long-running operations

3. Tuning Temporary Table Performance


PROBLEM: Query creating huge temporary table on disk

SHOW STATUS LIKE 'Created_tmp_disk_tables';
-- Result: high numbers (performance problem!)

SOLUTION 1: Increase Memory Limit

SET GLOBAL tmp_table_size = 2G;
SET GLOBAL max_heap_table_size = 2G;

Now temporary tables have 2GB budget before spilling to disk

SOLUTION 2: Optimize Query to Avoid Temp Table

❌ SLOW (creates temporary table):
SELECT DISTINCT country FROM users WHERE age > 18;
-- Without index on country, needs temp table

✓ FAST (uses index):
CREATE INDEX idx_age_country ON users(age, country);
SELECT DISTINCT country FROM users WHERE age > 18;
-- Now uses index, no temp table needed

SOLUTION 3: Use Explicit Temp Table with Index

CREATE TEMPORARY TABLE temp_users AS
SELECT * FROM users WHERE age > 18;

CREATE INDEX idx_country ON temp_users(country);

SELECT DISTINCT country FROM temp_users;
-- Explicitly indexes result, then queries

SOLUTION 4: Batch Process

❌ All at once (large temp table):
SELECT name, COUNT(*) FROM users GROUP BY name;

✓ In batches (smaller temp tables):
FOR i = 0; i < 26; i++ DO
  SELECT name, COUNT(*) FROM users
  WHERE name LIKE CONCAT(CHAR(64+i), '%')
  GROUP BY name;
END FOR
-- Process each letter separately, 26 smaller queries

4. Common Use Cases

4.1 Data Deduplication


PROBLEM: Remove duplicates from large table

❌ Slow approach (creates huge temp table):
DELETE FROM events
WHERE id NOT IN (
  SELECT MIN(id) FROM events GROUP BY event_key
);

✓ Fast approach (explicit temp table):
CREATE TEMPORARY TABLE temp_dedup AS
SELECT MIN(id) as keep_id
FROM events
GROUP BY event_key;

CREATE INDEX idx_keep ON temp_dedup(keep_id);

DELETE FROM events
WHERE id NOT IN (SELECT keep_id FROM temp_dedup);

4.2 Complex ETL Pipeline


Step 1: Extract
CREATE TEMPORARY TABLE raw_data AS
SELECT * FROM source_table WHERE date >= '2024-01-01';

Step 2: Validate
ALTER TABLE raw_data ADD COLUMN is_valid BOOLEAN DEFAULT TRUE;
UPDATE raw_data SET is_valid = FALSE
WHERE amount < 0 OR amount > 1000000;

Step 3: Transform
CREATE TEMPORARY TABLE transformed_data AS
SELECT
  id,
  UPPER(name) as name,
  ROUND(amount, 2) as amount,
  CASE
    WHEN amount > 10000 THEN 'high'
    WHEN amount > 1000 THEN 'medium'
    ELSE 'low'
  END as category
FROM raw_data
WHERE is_valid = TRUE;

Step 4: Load
INSERT INTO staging_table
SELECT * FROM transformed_data;

4.3 A/B Testing Analysis


-- Analyze two versions

CREATE TEMPORARY TABLE variant_a AS
SELECT user_id, COUNT(*) as conversions, AVG(revenue) as avg_revenue
FROM events
WHERE variant = 'A'
GROUP BY user_id;

CREATE TEMPORARY TABLE variant_b AS
SELECT user_id, COUNT(*) as conversions, AVG(revenue) as avg_revenue
FROM events
WHERE variant = 'B'
GROUP BY user_id;

-- Statistical comparison
SELECT
  'A' as variant,
  COUNT(*) as users,
  SUM(conversions) as total_conversions,
  AVG(avg_revenue) as avg_revenue
FROM variant_a

UNION ALL

SELECT
  'B' as variant,
  COUNT(*) as users,
  SUM(conversions) as total_conversions,
  AVG(avg_revenue) as avg_revenue
FROM variant_b;

5. Memory Management


MONITORING RAM USAGE:

-- See current temporary table activity
SELECT * FROM INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO;

-- Monitor creation rate
SHOW GLOBAL STATUS LIKE 'Created_tmp%';

CONFIGURATION:

-- Memory settings
max_heap_table_size (default 16MB)
tmp_table_size (default 16MB)

These limit MEMORY engine table size
If exceeded, converts to disk-based InnoDB

EXAMPLE: Increase for large aggregations

SET GLOBAL max_heap_table_size = 512M;
SET GLOBAL tmp_table_size = 512M;

Then retry query:
SELECT name, COUNT(*) as count
FROM large_table
GROUP BY name;

-- Now aggregation keeps more data in memory
-- Fewer conversions to disk
-- Better performance

6. Best Practices

Conclusion

Temporary tables are essential for complex data processing but can become bottlenecks if not managed properly. Keep temporary tables small by optimizing queries, use appropriate storage engines (MEMORY vs InnoDB), and monitor creation rates. Explicit temporary tables with indexes often outperform implicit MySQL-generated temporary tables for complex operations.

Rate this chapter
4.7  / 5  (14 ratings)

💬 Comments