🔴

Kafka Internals: From Message Delivery to Streaming Architecture

Master Apache Kafka 3.7+ from the ground up: binary protocol dissection, Reactor network model, log storage engine source code, ISR replication, KRaft controller, Kafka Streams, Kafka Connect, hierarchical timing wheel internals, and root cause analysis of 10 real production incidents. 32 chapters at internals depth, completely free.

32
Chapters
Free
Forever
Start Reading →
Table of Contents
Ch01
Kafka Is Not Just a Message Queue
Message queue vs event streaming platform, deep comparison with RabbitMQ/Pulsar/RocketMQ/Redpanda, use-case decision matrix
Ch02
Architecture: The Complete Journey of a Message
Full pipeline from Producer to Consumer with latency breakdown at every stage
Ch03
KRaft: How Kafka Implements Raft
Raft protocol internals, Kafka's KRaft variant, __cluster_metadata Topic, and complete ZK→KRaft migration
Ch04
Kafka Protocol: Binary Wire Format Dissected
Frame structure, ApiKey version negotiation, byte-level dissection of core requests, Wireshark packet analysis
Ch05
Message Format Evolution: V0→V1→V2 RecordBatch
Three generations of message format compared, V2 RecordBatch with varints, CRC, Magic Byte, and troubleshooting implications
Ch06
Producer Internals: Dual-Thread Model and Memory Management
Main thread vs Sender thread, BufferPool memory reuse, RecordAccumulator batching, batch.size + linger.ms math, backpressure
Ch07
Idempotence and Transactions: The Cost and Boundaries of EOS
PID+Epoch+Sequence idempotence, TransactionCoordinator state machine, two-phase commit, Zombie Fencing, EOS performance cost
Ch08
Producer Tuning: Reaching 1 Million TPS
Five compression algorithms benchmarked, max.in.flight and reordering truth, complete benchmarking methodology, 50K to 1M TPS case study
Ch09
Consumer Group Protocol: Join, Sync and Heartbeat
Group Coordinator election, JoinGroup/SyncGroup/Heartbeat sequence, Generation ID, three assignment strategies in source code
Ch10
Everything About Offsets: Storage, Commit and Reset
__consumer_offsets internals, Offset Compaction, commitSync pitfalls, Position vs Committed Offset vs LEO, ghost consumption
Ch11
Cooperative Rebalance and Static Membership
Eager vs Cooperative rebalance, CooperativeStickyAssignor incremental migration, static membership, Rack Awareness
Ch12
Consumer Tuning and Lag Management
Three-parameter triangle, three ways to calculate Consumer Lag, 5 emergency lag reduction strategies, Fetch optimization
Ch13
Network Layer: Reactor Model and Request Pipeline
Acceptor→Processor→RequestChannel→RequestHandler model, Purgatory delayed operations, tuning network/io thread counts
Ch14
Log Storage Engine: Why Kafka Is So Fast
LogSegment byte-level structure, sparse index binary search, sequential write+page cache+zero-copy+mmap, 1000x perf gap, Log Recovery
Ch15
Replication: ISR, Watermarks and Data Consistency
Follower Fetch, LEO/HW/Leader Epoch watermarks, ISR shrink/expand, Unclean Election data loss simulation, KIP-101
Ch16
Log Compaction and Retention: Source-Level Analysis
delete vs compact, Compaction thread (OffsetMap→dirty segment cleaning→Swap), Tombstone messages, Compaction+transaction interaction
Ch17
Source Code Guide: Module Structure and Debug Setup
Source directory layout, IntelliJ debug setup, breakpoint tracing a message, core class relationship diagram
Ch18
The Complete Path of a Produce Request
KafkaApis → ReplicaManager → Log → LogSegment full source path, DelayedProduce in Purgatory
Ch19
Fetch Request and Consumer Pull Mechanism
handleFetchRequest source, Leader vs Follower read paths, FetchDataInfo, DelayedFetch triggers, Consumer Fetcher prefetch
Ch20
Controller Source Code: Election, Assignment and State Machine
QuorumController, Partition state machine, Replica state machine, ControllerEvent single-thread model, failover flow
Ch21
Group Coordinator Source: The Complete Rebalance Flow
GroupMetadataManager, GroupState machine, handleJoinGroup/handleSyncGroup source, DelayedJoin, Offset storage internals
Ch22
Timing Wheel and Delayed Operations: The Purgatory Design
Why not DelayQueue, hierarchical timing wheel O(1) insertion, TimingWheel implementation, unified Purgatory abstraction
Ch23
Kafka Streams Architecture: Topology, Threads and State
StreamThread→Task→Processor layers, StreamsPartitionAssignor, Standby Replicas, RocksDB tuning, Changelog Topic fault tolerance
Ch24
Windows, Joins and Exactly-Once Stream Processing
Four window types, three Join semantics, Grace Period for late arrivals, Streams EOS v2 performance improvements
Ch25
ksqlDB: Real-Time Stream Processing with SQL
ksqlDB architecture, CREATE STREAM/TABLE, Push vs Pull Query, Materialized View, ksqlDB vs Kafka Streams vs Flink SQL
Ch26
Kafka Connect Deep Dive: Framework and Custom Connectors
Worker/Task/Converter architecture, Distributed Offset management, Debezium CDC, custom Source Connector, SMT chain
Ch27
Schema Registry and Data Contracts
_schemas Topic, four compatibility levels, Schema evolution in practice, Data Contract governance, CI/CD integration
Ch28
Monitoring: The Complete Observability Stack
JMX Exporter config, five golden signals, Prometheus+Grafana dashboard templates, tiered alerting, end-to-end latency tracing
Ch29
Performance Tuning: OS, JVM and Broker End-to-End
Disk selection, Page Cache tuning, vm.dirty_ratio, G1GC params, ZGC viability, io/replica fetcher thread tuning formulas
Ch30
Security: Authentication, Authorization, Encryption and Audit
SASL five mechanisms compared, SSL/TLS performance impact, ACL model, Audit Log, RBAC, production security checklist
Ch31
Cross-Cluster Replication and Disaster Recovery
MirrorMaker 2 architecture, three Geo-Replication modes, Offset Translation, RPO/RTO design, Cluster Linking vs MM2
Ch32
Production War Stories: Root Cause Analysis of 10 Real Incidents
10 real production incidents: ISR avalanche, Rebalance storm, message loss, OOM, disk exhaustion, partition count explosion, SSL cert expiry, and more

💬 Comments