Chapter 4

Source to Execution: The Complete V8 Engine Pipeline

V8's JIT optimizer will silently undo its work whenever you pass different types to the same function — this is called Deoptimization, and every deoptimization forces the function back to interpreted bytecode execution until it proves "hot" enough to optimize again. Whether your code gets JIT-compiled depends entirely on whether you help V8 make correct type predictions.

🔹 Level 1 · What You Need to Know

The Complete Journey of JavaScript Code (5 Steps)

Source code string
     │
     ▼  Lexical analysis (Scanner)
Token stream (keywords / identifiers / literals / operators)
     │
     ▼  Syntax analysis (Parser)
AST (Abstract Syntax Tree)
     │
     ▼  Ignition (Interpreter)
BytecodeArray                  ← can be inspected at runtime!
     │
     ▼  Turbofan (Optimizing compiler, fires on hot functions)
Machine code                   ← executed directly by CPU, fastest
     │
     ▼  Deoptimization (type assumption fails)
Back to BytecodeArray

Understanding this pipeline explains 3 confusing phenomena:

Behavior	Reason
Syntax errors are reported before code runs	Parser runs before execution; parse failure = immediate error
`function foo(){}` is callable before its declaration	Function declarations are hoisted during AST phase, before any execution
V8's JIT sometimes "degrades"	Type assumption fails; Turbofan scraps optimized code, falls back to Ignition

3 Common Deoptimization Triggers to Avoid

// ❌ Trigger 1: Storing different types in the same variable
function addNumbers(a, b) {
  return a + b;
}
addNumbers(1, 2);      // V8 assumes: a and b are integers
addNumbers(1.5, 2.5);  // Still OK — doubles
addNumbers('a', 'b');  // Deopt! Integer/float assumption broken

// ✅ One type per function — each stays optimized independently
function addIntegers(a, b) { return a + b; }
function addStrings(a, b) { return a + b; }

// ❌ Trigger 2: Adding properties dynamically (changes Hidden Class)
function processUser(user) {
  console.log(user.name);
}
processUser({ name: 'Alice' });           // user HC: {name}
processUser({ name: 'Bob', age: 30 });   // different HC! deopt

// ✅ Keep the same object shape
processUser({ name: 'Alice', age: null }); // same HC even if age is null
processUser({ name: 'Bob', age: 30 });    // same HC, stays optimized

// ❌ Trigger 3: delete on object properties (destroys Hidden Class)
const obj = { x: 1, y: 2 };
delete obj.x;  // HC broken; object degrades to dictionary mode (hash table)
// Use obj.x = undefined instead — preserves HC structure

How to Inspect V8 Bytecode

# Node.js 14+ supports --print-bytecode
node --print-bytecode script.js 2>&1 | head -100

# Filter to a specific function name
node --print-bytecode --print-bytecode-filter="functionName" script.js

# Trace JIT optimization decisions
node --trace-opt script.js
node --trace-deopt script.js  # see which functions were deoptimized

🔸 Level 2 · How It Really Works

V8's Full Compilation Pipeline

V8 compilation pipeline detail:

JavaScript source code (UTF-16 string)
         │
         ▼
┌────────────────────────────────────────┐
│  Scanner (Lexer)                        │
│  · Splits character stream into Tokens  │
│  · Recognizes: keywords, identifiers,   │
│    number/string literals, operators    │
│  · Skips whitespace and comments        │
│  · Output: Token stream                 │
└────────────────────────────────────────┘
         │
         ▼ Token stream
┌────────────────────────────────────────┐
│  Parser                                 │
│  · Eager Parse (immediate):             │
│    - Top-level code                     │
│    - Immediately invoked functions      │
│  · Lazy Parse (deferred default):       │
│    - Function bodies (default)          │
│    - Parses only signature, not body    │
│    - Full parse on first call           │
│  · Output: AST                          │
└────────────────────────────────────────┘
         │
         ▼ AST
┌────────────────────────────────────────┐
│  Ignition (Interpreter)                 │
│  · AST → BytecodeArray                  │
│  · Register-based bytecode              │
│  · Collects TypeFeedback per operation  │
│    — records runtime type of each arg   │
│  · Marks hot functions (call threshold) │
└────────────────────────────────────────┘
         │
         ▼ (hot function, calls exceed threshold)
┌────────────────────────────────────────┐
│  Turbofan (Optimizing Compiler)         │
│  · Reads TypeFeedback from Ignition     │
│  · Generates specialized code based on │
│    type assumptions                     │
│  · Sea-of-Nodes IR                     │
│  · Inlining, escape analysis, loop opt  │
│  · Output: native machine code          │
└────────────────────────────────────────┘
         │
         ▼ (type assumption fails)
┌────────────────────────────────────────┐
│  Deoptimization                         │
│  · Discard generated machine code       │
│  · Reconstruct Ignition interpreter     │
│    state from the deopt frame           │
│  · Resume execution from bytecode       │
│  · May be re-optimized (higher cost)    │
└────────────────────────────────────────┘

Scanner: Lexical Analysis

The Scanner turns a character stream into a Token stream. A Token is the smallest meaningful unit:

// Source code:
const answer = 42;

// Token stream (Scanner output):
// Token 1: CONST        (keyword)
// Token 2: IDENTIFIER   "answer"
// Token 3: ASSIGN       "="
// Token 4: NUMBER       42
// Token 5: SEMICOLON    ";"

The Scanner handles one classic ambiguity: / can be a division operator or the start of a regular expression:

const result = value / regex;  // / is division
const re = /pattern/g;         // / starts a regex

V8's Scanner resolves this using context — a / appearing after an expression value is division; a / at the start of a statement or expression is a regex delimiter.

Parser: Eager vs Lazy Parsing

The Parser turns the Token stream into an AST. V8 uses two strategies:

// Eager Parse (full parse immediately):
// 1. Top-level code
const x = 1;         // parsed immediately

// 2. Immediately Invoked Function Expressions (IIFE)
(function() { })();  // parsed immediately

// 3. Exported functions (module system needs them)
export function foo() { }  // parsed immediately

// Lazy Parse (deferred):
// Function bodies are shallow-parsed by default
function heavy() {
  // Before heavy() is called, this body is only shallow-parsed
  // (syntax is validated but no full AST is produced)
  const data = processHeavyData();
  return data;
}
// First call to heavy() triggers full Parse → Ignition flow

Why lazy parsing matters for startup time:

Startup time comparison (simplified):
Eager (parse everything up front):
  Parse all code → Generate all bytecode → Start executing
  Total: ~300ms (assuming 1,000 function definitions)

Lazy (V8 default):
  Parse top-level → Generate top-level bytecode → Start executing
  Parse each function body on first call
  Total: ~30ms startup (first-call cost is slightly higher)

AST structure (inspect any code at astexplorer.net):

// Source:
function add(a, b) { return a + b; }

// Simplified AST:
{
  "type": "Program",
  "body": [{
    "type": "FunctionDeclaration",
    "id": { "type": "Identifier", "name": "add" },
    "params": [
      { "type": "Identifier", "name": "a" },
      { "type": "Identifier", "name": "b" }
    ],
    "body": {
      "type": "BlockStatement",
      "body": [{
        "type": "ReturnStatement",
        "argument": {
          "type": "BinaryExpression",
          "operator": "+",
          "left":  { "type": "Identifier", "name": "a" },
          "right": { "type": "Identifier", "name": "b" }
        }
      }]
    }
  }]
}

Ignition: Register-Based Bytecode

Ignition compiles the AST to bytecode (BytecodeArray). V8's bytecode is register-based (unlike the Java JVM which is stack-based):

// Function:
function add(a, b) { return a + b; }

// V8 bytecode (via --print-bytecode, simplified):
// [generated bytecode for function: add]
//
// Ldar a0        Load parameter a (index 0) into accumulator
// Add a1, [0]    accumulator += parameter b; [0] = feedback slot index
// Return         Return accumulator value

Ignition also collects TypeFeedback: every time a bytecode operation executes, it records the runtime type of each operand into a Feedback Vector. Turbofan reads this vector when deciding what type assumptions to make for optimization.

Turbofan: Type-Assumption-Based Machine Code

Turbofan is V8's optimizing compiler, using a Sea-of-Nodes internal representation (IR). When a function is called frequently enough (approximately 1,000–1,500 calls), Turbofan takes over:

Turbofan optimization flow:

1. Read Feedback Vector (types recorded by Ignition)
   Example: add(a, b) — a and b have always been Smi (Small Integer)

2. Generate type-specialized machine code:
   IF a is Smi AND b is Smi:
     Use integer add instruction (no type checks needed)
   ELSE:
     Deoptimize and fall back to interpreter

3. Output x64 machine code:
   ; Fast path with Smi check
   mov eax, [a]
   add eax, [b]
   jo deopt_label    ; overflow → deoptimize
   ret

4. If type assumptions hold:
   Function runs at near-native C speed

5. If type assumptions fail (float or string passed):
   → Deoptimization

Deoptimization: The Cost of Type Assumptions

// Full deoptimization demonstration
function compute(x) {
  return x * x;
}

// Train V8: pass integers so Turbofan optimizes
for (let i = 0; i < 10000; i++) {
  compute(i);          // after ~1000 calls, Turbofan generates
                       // integer-assumption machine code
}

// Trigger deoptimization:
compute(3.14);         // Float! Assumption fails.
// V8 executes:
// 1. Detects type mismatch in Smi-optimized path
// 2. Discards machine code for compute
// 3. Reconstructs interpreter (Ignition) state
// 4. Re-executes compute(3.14) from bytecode
// 5. Updates Feedback Vector to include float type
// 6. compute may be re-optimized (this time for floats)

// See deopt events with:
// node --trace-deopt sum.js
// Output: [deoptimizing (DEOPT soft): begin compute @0 ...]

🔺 Level 3 · What the Spec Says

How the Spec Defines Source Code and Script Parsing

Chapter 12 (Source Text) defines the lexical foundation:

"12.1 Source Text

The source text of an ECMAScript Script or Module is first converted to a sequence of input elements, then parsed. ...

SourceCharacter :: any Unicode code point

ECMAScript code is expressed using Unicode. However, an ECMAScript implementation need not express source text using Unicode; any text encoding that includes the full Unicode character set can be used as long as the internal representation uses Unicode code points."

Chapter 16 (Scripts and Modules):

"16.1 Scripts

Syntax Script: ScriptBody? ScriptBody: StatementList[~Yield, ~Await, ~Return]

16.1.1 Static Semantics: Early Errors Script: ScriptBody It is a Syntax Error if the code matched by this production is not strict mode code..."

Early Errors are spec-defined compile-time errors — this is precisely why syntax errors are detected before any code executes:

Early Errors (Spec Chapter 16):
  - These errors MUST be detected before code is run
  - The Parser is responsible for detecting Early Errors
  - Syntax errors are a subset of Early Errors

Example:
  function f() { 'use strict'; with({}) {} }
  // with in strict mode is an Early Error (spec-mandated)
  // Reported at parse time, not when the with statement runs

Relationship Between V8 Bytecode and Spec "Evaluation" Semantics

The spec defines what (semantics); V8 defines how (implementation). For example, the spec's definition of the addition operator (Section 13.15):

"13.15.3 ApplyStringOrNumericBinaryOperator(lval, opText, rval)

If opText is +, then a. Let lprim be ? ToPrimitive(lval). b. Let rprim be ? ToPrimitive(rval). c. If Type(lprim) is String or Type(rprim) is String, then i. Let lstr be ? ToString(lprim). ii. Let rstr be ? ToString(rprim). iii. Return the String that is the result of concatenating lstr and rstr. d. Set lval to lprim. e. Set rval to rprim.

...(numeric addition)"

Turbofan's implementation: first check if both operands are Smi (Small Integer). If yes, use a direct integer add instruction — skipping all ToPrimitive and ToString calls. This is a legal optimization: for two integers, ToPrimitive returns them unchanged and ToString is never reached, so the spec semantics and the machine code result are identical.

💎 Level 4 · Edge Cases and Traps

Trap 1: Hidden Class Degradation and Performance Loss

Hidden Classes (also called "shapes" or "maps") are internal type descriptors V8 assigns to objects. Objects with the same property structure share a Hidden Class:

// Same structure → shared Hidden Class → fast property access
function createPoint(x, y) {
  return { x, y }; // always creates the same properties in the same order
}

const p1 = createPoint(1, 2);   // Hidden Class: HC0 {x, y}
const p2 = createPoint(3, 4);   // reuses HC0 → fast path

// Different structures → different Hidden Classes → impacts optimization
const p3 = { y: 1, x: 2 };     // different property order! HC1 {y, x}
const p4 = { x: 1 };           // missing y → HC2 {x}

// What V8 sees when these are passed to the same function:
// createPoint() results → all share HC0 → Turbofan assumes one shape
// p3, p4 → HC1, HC2 → polymorphic or megamorphic access

Performance tiers:

// ✅ Monomorphic — fastest
function getX(pt) { return pt.x; }
getX({ x: 1, y: 2 });
getX({ x: 3, y: 4 });  // same HC — Turbofan generates single inline cache

// ⚠️ Polymorphic (2–4 Hidden Classes) — slower
getX({ x: 1, y: 2 });
getX({ x: 3 });        // different HC — Turbofan generates multi-branch code

// ❌ Megamorphic (5+ Hidden Classes) — slowest
// Inline cache abandoned; falls back to full hash table lookup per access

Fix: keep property creation order consistent

// ✅ Always define properties in the same order
class Point {
  constructor(x, y) {
    this.x = x;  // x always first
    this.y = y;  // y always second
  }
}

// ❌ Conditional property addition (two different Hidden Classes)
function User(name, isAdmin) {
  this.name = name;
  if (isAdmin) {
    this.adminLevel = 1;  // only admins have this → two HCs
  }
}

// ✅ Always initialize, use null for absent values
function User(name, isAdmin) {
  this.name = name;
  this.adminLevel = isAdmin ? 1 : null;  // consistent shape
}

Trap 2: Why `eval()` Is Slow (Not Just "Security")

// eval()'s 3 performance killers:

// Killer 1: Blocks static scope analysis
function slowFunc() {
  const x = 1;
  eval('var x = 2'); // eval can modify local variables!
  // V8 cannot treat x as a constant because eval may change it
  return x;          // must re-read from memory every time; no inlining
}

// Killer 2: Blocks variable elimination
function noOpt() {
  let result = 0;
  for (let i = 0; i < 1000; i++) {
    result += i;
    if (Math.random() > 0.9999) {
      eval('result = 0'); // theoretically possible; V8 can't assume result
    }
  }
  return result;
}

// Killer 3: Code inside eval is re-parsed on every call
function repeated() {
  for (let i = 0; i < 1000; i++) {
    eval('1 + 1'); // parse "1 + 1" on every iteration — no caching!
  }
}
// Correct: extract dynamically needed logic into named functions

Trap 3: Why `with` Is Banned in Strict Mode (Technical Reason)

// with introduces dynamic scope:
var x = 1;
var obj = { x: 10, y: 20 };

with (obj) {
  console.log(x);  // 10 (obj.x), not the outer 1
  console.log(y);  // 20 (obj.y)
  console.log(z);  // ??? — cannot determine source of z until runtime
}

Why with blocks all optimization:

Normal scope (static):
  Variable sources determined at compile time
  V8 can replace variables with stack frame offsets
  Runtime: direct memory access, O(1)

with statement (dynamic):
  Cannot determine property lookup path at compile time
  Any variable name xyz inside a with block might be:
    a. A property of the with object
    b. A variable in an outer scope
    c. A global variable
  Runtime: every variable access = search with object (hash table)
           + potentially continue up the scope chain

  Turbofan cannot make ANY type assumptions inside a with block
  → Any function containing a with statement cannot be Turbofan-optimized

Trap 4: Reading Real V8 Bytecode with `--print-bytecode`

// sum.js
function sum(a, b) {
  return a + b;
}
console.log(sum(1, 2));

node --print-bytecode --print-bytecode-filter="sum" sum.js

Actual output (Node.js 18, x64, simplified):

[generated bytecode for function: sum (0x...)]
Bytecode length: 8
Parameter count 3    (this, a, b)
Register count 0

         0 : 25 02             Ldar a0       // Load param a into accumulator
         2 : 35 03 00          Add a1, [0]   // accumulator += param b; [0]=feedback slot
         5 : a8                Return        // Return accumulator value

Reading the bytecode:

Bytecode instruction meanings:
Ldar a0     → Load Accumulator from Register a0 (parameter a)
Add a1, [0] → accumulator += a1; [0] is the feedback slot index
              (type feedback is collected here for Turbofan)
Return      → return current accumulator value

This bytecode is just 8 bytes and 3 instructions. When sum is Turbofan-optimized for integer inputs, the generated x64 machine code looks approximately like:

; x64 machine code (simplified)
; sum(a, b) where a, b are known to be Smi (Small Integer)
mov eax, [a]        ; load a
mov ecx, [b]        ; load b
add eax, ecx        ; integer addition
jo  deopt_handler   ; overflow → deoptimize
ret                 ; return eax

Chapter Summary

JavaScript source code travels through 5 stages before execution: Scanner (lexical analysis) → Parser (builds AST) → Ignition (generates and interprets bytecode) → Turbofan (JIT-compiles hot functions to machine code) → Deoptimization (fallback when type assumptions fail). Syntax errors surface at the Parser stage, long before any execution.
Lazy Parse is V8's startup optimization: function bodies are shallow-parsed by default and fully parsed only on first call. For pages with thousands of function definitions, this reduces startup time by roughly 5–10x, at the cost of a one-time parse overhead on the first call.
Turbofan optimization is built on type assumptions: Ignition records runtime types in a Feedback Vector; Turbofan generates specialized machine code from those records. When a type assumption fails, Deoptimization occurs and the function re-executes from bytecode.
Hidden Classes are the foundation of fast property access in V8: objects with the same properties in the same creation order share a Hidden Class, allowing Turbofan to use direct memory-offset access (not hash table lookup). delete, dynamic property addition, and inconsistent property creation order all cause Hidden Class splits, degrading to Megamorphic (dictionary-mode) access.
eval() and with are performance black holes: eval() blocks static scope analysis and variable inlining; with makes every variable reference inside the block dynamically scoped, preventing Turbofan from applying any type assumptions to the entire enclosing function. Strict mode's prohibition of with is grounded in this technical reality.

Rate this chapter

4.5 / 5 (78 ratings)