Chapter 31

IP Lookup Service: Binary Protocol Parsing

IP Lookup Service: Binary Protocol Parsing

Every time you visit a website, the server on the other side can infer your geographic location, your ISP, and sometimes even your device type — all from your IP address. This capability underpins content localization, ad targeting, security enforcement, and regulatory compliance across a vast swath of the modern internet.

IP geolocation is a fascinating engineering problem: how do you find, with minimum latency, the geographic record corresponding to an IP address in a database containing millions of entries? The answer is not in a faster CPU. It is in data structure design and binary format choices.

This chapter uses an IP lookup service as the central example to explain binary file format parsing, memory-mapped files, binary search, LRU caching, and HTTP API design. These techniques combine to deliver microsecond-latency IP lookups — without a database engine.


Level 1 · Use Cases and Background

Why IP Geolocation Matters

Content localization: serve content in the correct language and comply with jurisdiction-specific regulations (e.g., GDPR requirements for EU visitors).

Traffic security: detect anomalous traffic originating from high-risk regions; identify VPN/proxy/Tor exit nodes — a foundational capability for Web Application Firewalls (WAF).

Ad targeting: deliver geographically targeted advertising, one of the core mechanisms of internet advertising.

Compliance enforcement: streaming rights are licensed per region (Netflix's library differs by country). IP geolocation is the technical foundation for region-based access control.

Network diagnostics: determining which Autonomous System (AS) and ISP own a given IP is essential for network operations.

MaxMind GeoIP vs IP2Location

The two most widely used IP database providers:

MaxMind GeoIP2:

IP2Location:

This chapter uses IP2Location BIN format as the primary example, because its format is fully publicly documented, making it an ideal vehicle for teaching binary parsing principles.

Text Format vs. Binary Format: Why There Is a 100× Performance Gap

Consider storing IP geolocation data in CSV:

1.0.0.0,1.0.0.255,AU,Australia,Queensland,Brisbane,...
1.0.1.0,1.0.3.255,CN,China,Fujian,Fuzhou,...

Looking up 1.0.2.1 requires:

  1. Scanning line by line (or loading into memory and parsing row by row)
  2. Converting the string "1.0.1.0" to integer 16777984 for range comparison
  3. Repeated string parsing and memory allocation on every query

Binary format eliminates all of this:

For a database with 4 million records, binary search requires only about 22 comparisons (log₂(4,000,000) ≈ 22), completing each lookup in microseconds.


Level 2 · The IP2Location BIN Format: A Deep Dive

File Layout

An IP2Location BIN file consists of three regions:

┌─────────────────────────────────────────────────────┐
│                   File Header (64 bytes)             │
├─────────────────────────────────────────────────────┤
│       IPv4 Index Section (optional, for acceleration)│
├─────────────────────────────────────────────────────┤
│       IPv4 Data Records (fixed-length, sorted by IP) │
├─────────────────────────────────────────────────────┤
│       IPv6 Index Section (optional)                  │
├─────────────────────────────────────────────────────┤
│       IPv6 Data Records                              │
└─────────────────────────────────────────────────────┘

Parsing the File Header

// IP2Location BIN file header (first 64 bytes)
type Header struct {
    DBType        uint8  // database type (1=DB1 country, 2=DB2 country+city, ...)
    DBColumn      uint8  // number of columns per record
    DBYear        uint8
    DBMonth       uint8
    DBDay         uint8
    IPv4Count     uint32 // number of IPv4 records
    IPv4Addr      uint32 // file offset of IPv4 data section
    IPv6Count     uint32
    IPv6Addr      uint32
    IPv4IndexAddr uint32 // file offset of IPv4 index (0 = no index)
    IPv6IndexAddr uint32
    ProductCode   uint8
    LicenseCode   uint8
    DatabaseSize  uint32
}

Reading the header using encoding/binary:

import (
    "encoding/binary"
    "os"
    "fmt"
)

func readHeader(f *os.File) (*Header, error) {
    var h Header
    // IP2Location uses little-endian byte order
    if err := binary.Read(f, binary.LittleEndian, &h); err != nil {
        return nil, fmt.Errorf("reading header: %w", err)
    }
    return &h, nil
}

Endianness: Why It Matters

Multi-byte integers can be stored in two ways:

IP2Location uses little-endian for multi-byte integers. Reading with the wrong endianness turns a valid "USA" string offset into a wildly incorrect number, pointing to garbage data or causing a crash.

func demonstrateEndianness() {
    data := []byte{0x78, 0x56, 0x34, 0x12}

    // Correct: little-endian
    leValue := binary.LittleEndian.Uint32(data) // = 0x12345678 = 305419896

    // Wrong: big-endian
    beValue := binary.BigEndian.Uint32(data)    // = 0x78563412 = 2018915346

    fmt.Printf("little-endian: %d\n", leValue) // 305419896
    fmt.Printf("big-endian:    %d\n", beValue) // 2018915346 — wrong!
}

Data Record Format

Each IPv4 record:

┌───────────────┬───────────────┬───────────────────────────────────────────┐
│ IP Start (4B) │ IP End (4B)   │   Data Fields (offsets into string pool)  │
└───────────────┴───────────────┴───────────────────────────────────────────┘

Strings (country names, city names) are not stored inline. They live in a string pool at the end of the file; the record stores a 4-byte offset into that pool. This means "China" is stored once, no matter how many records reference it.

type Record struct {
    IPFrom       uint32  // start of IP range (as uint32)
    IPTo         uint32  // end of IP range
    // The following are offsets into the string pool
    CountryShort uint32  // offset of country code ("CN")
    CountryLong  uint32  // offset of full country name ("China")
    Region       uint32
    City         uint32
    ISP          uint32
    Latitude     float32 // stored directly as IEEE 754 float
    Longitude    float32
}

// IP2Location string format: [1-byte length][content bytes]
func readString(data []byte, offset uint32) string {
    if int(offset) >= len(data) {
        return ""
    }
    length := int(data[offset])
    start := int(offset) + 1
    end := start + length
    if end > len(data) {
        return ""
    }
    return string(data[start:end])
}

Memory-Mapped Files: Zero-Copy Reads

For frequently-read, read-only files, memory mapping (mmap) is the most efficient access mechanism. It maps the file directly into the process's virtual address space. The OS page cache handles on-demand loading, and read() system calls — along with their kernel-to-userspace data copies — are eliminated entirely.

import (
    "os"
    "syscall"
)

type MmapFile struct {
    data []byte
    f    *os.File
}

func OpenMmap(path string) (*MmapFile, error) {
    f, err := os.Open(path)
    if err != nil {
        return nil, err
    }

    stat, err := f.Stat()
    if err != nil {
        f.Close()
        return nil, err
    }

    data, err := syscall.Mmap(
        int(f.Fd()),
        0,
        int(stat.Size()),
        syscall.PROT_READ,
        syscall.MAP_SHARED,
    )
    if err != nil {
        f.Close()
        return nil, fmt.Errorf("mmap: %w", err)
    }

    return &MmapFile{data: data, f: f}, nil
}

func (m *MmapFile) Close() error {
    syscall.Munmap(m.data)
    return m.f.Close()
}

// Direct offset-based access — no system call required
func (m *MmapFile) ReadUint32LE(offset int) uint32 {
    return binary.LittleEndian.Uint32(m.data[offset:])
}

Three core advantages of mmap over os.File.Read():

  1. Zero copy: the OS maps disk pages directly into the process address space; no kernel→userspace copy
  2. Page-cache sharing: when multiple processes open the same file, the kernel maintains a single shared page cache
  3. Efficient random access: pointer arithmetic instead of seek system calls

Level 3 · Implementing the IP Lookup

IP Address Parsing: IPv4 and IPv6

The net package provides IP address parsing and manipulation:

import "net"

// Convert IPv4 address to uint32 for comparison with database integers
func ipv4ToUint32(ip net.IP) uint32 {
    // net.IP can be 4-byte (IPv4) or 16-byte (IPv4-mapped IPv6)
    ip = ip.To4()
    if ip == nil {
        return 0
    }
    // IP bytes are big-endian
    return uint32(ip[0])<<24 | uint32(ip[1])<<16 | uint32(ip[2])<<8 | uint32(ip[3])
}

// Reconstruct net.IP from uint32
func uint32ToIPv4(n uint32) net.IP {
    return net.IPv4(byte(n>>24), byte(n>>16), byte(n>>8), byte(n))
}

Pre-computed private IP ranges (RFC 1918 and others):

type PrivateChecker struct {
    ranges []*net.IPNet
}

func NewPrivateChecker() *PrivateChecker {
    cidrs := []string{
        "10.0.0.0/8",
        "172.16.0.0/12",
        "192.168.0.0/16",
        "127.0.0.0/8",    // loopback
        "169.254.0.0/16", // link-local
        "::1/128",        // IPv6 loopback
        "fc00::/7",       // IPv6 private
        "100.64.0.0/10",  // RFC 6598 shared address space
    }
    pc := &PrivateChecker{}
    for _, cidr := range cidrs {
        _, network, err := net.ParseCIDR(cidr)
        if err == nil {
            pc.ranges = append(pc.ranges, network)
        }
    }
    return pc
}

func (pc *PrivateChecker) IsPrivate(ip net.IP) bool {
    for _, network := range pc.ranges {
        if network.Contains(ip) {
            return true
        }
    }
    return false
}

Binary Search: O(log n) IP Range Lookup

The core algorithm is binary search over the sorted array of IP range records:

type DB struct {
    data       []byte
    header     *Header
    recordSize int
    private    *PrivateChecker
}

type GeoInfo struct {
    IP          string  `json:"ip"`
    CountryCode string  `json:"country_code"`
    Country     string  `json:"country"`
    Region      string  `json:"region"`
    City        string  `json:"city"`
    ISP         string  `json:"isp"`
    Latitude    float32 `json:"latitude"`
    Longitude   float32 `json:"longitude"`
}

func (db *DB) Lookup(ipStr string) (*GeoInfo, error) {
    ip := net.ParseIP(ipStr)
    if ip == nil {
        return nil, fmt.Errorf("invalid IP address: %s", ipStr)
    }

    if db.private.IsPrivate(ip) {
        return &GeoInfo{IP: ipStr, Country: "PRIVATE", City: "PRIVATE"}, nil
    }

    ipInt := ipv4ToUint32(ip.To4())
    recordOffset, err := db.binarySearch(ipInt)
    if err != nil {
        return nil, err
    }

    return db.parseRecord(recordOffset, ipStr), nil
}

func (db *DB) binarySearch(target uint32) (uint32, error) {
    count := int(db.header.IPv4Count)
    base := int(db.header.IPv4Addr) - 1 // file offset is 1-indexed

    low, high := 0, count-1

    for low <= high {
        mid := (low + high) / 2
        recordOffset := base + mid*db.recordSize

        ipFrom := binary.LittleEndian.Uint32(db.data[recordOffset:])
        ipTo   := binary.LittleEndian.Uint32(db.data[recordOffset+4:])

        switch {
        case target < ipFrom:
            high = mid - 1
        case target > ipTo:
            low = mid + 1
        default:
            return uint32(recordOffset), nil // found
        }
    }

    return 0, fmt.Errorf("no record for IP %d", target)
}

func (db *DB) parseRecord(offset uint32, ipStr string) *GeoInfo {
    data := db.data
    base := int(offset) + 8 // skip the 8-byte IP range

    countryCodeOffset := binary.LittleEndian.Uint32(data[base:])
    countryOffset     := binary.LittleEndian.Uint32(data[base+4:])
    regionOffset      := binary.LittleEndian.Uint32(data[base+8:])
    cityOffset        := binary.LittleEndian.Uint32(data[base+12:])
    ispOffset         := binary.LittleEndian.Uint32(data[base+16:])
    lat := math.Float32frombits(binary.LittleEndian.Uint32(data[base+20:]))
    lon := math.Float32frombits(binary.LittleEndian.Uint32(data[base+24:]))

    return &GeoInfo{
        IP:          ipStr,
        CountryCode: readString(data, countryCodeOffset),
        Country:     readString(data, countryOffset),
        Region:      readString(data, regionOffset),
        City:        readString(data, cityOffset),
        ISP:         readString(data, ispOffset),
        Latitude:    lat,
        Longitude:   lon,
    }
}

LRU Cache: Accelerating Hot IPs

Although binary search is already fast (microseconds), for scenarios where the same IP is queried repeatedly (e.g., a security system repeatedly checking an attacker's IP), caching reduces CPU overhead further.

LRU (Least Recently Used) evicts the entry that has gone the longest without being accessed:

import "github.com/hashicorp/golang-lru/v2"

type CachedDB struct {
    db    *DB
    cache *lru.Cache[string, *GeoInfo]
}

func NewCachedDB(db *DB, cacheSize int) (*CachedDB, error) {
    cache, err := lru.New[string, *GeoInfo](cacheSize)
    if err != nil {
        return nil, err
    }
    return &CachedDB{db: db, cache: cache}, nil
}

func (c *CachedDB) Lookup(ipStr string) (*GeoInfo, error) {
    if info, ok := c.cache.Get(ipStr); ok {
        return info, nil // cache hit
    }

    info, err := c.db.Lookup(ipStr)
    if err != nil {
        return nil, err
    }

    c.cache.Add(ipStr, info)
    return info, nil
}

github.com/hashicorp/golang-lru/v2 uses generics, is thread-safe, and implements O(1) reads and writes using a doubly linked list + hash map.

HTTP API Endpoint

package main

import (
    "encoding/json"
    "log"
    "net"
    "net/http"
    "strings"
    "time"
)

type Server struct {
    db  *CachedDB
    mux *http.ServeMux
}

func NewServer(db *CachedDB) *Server {
    s := &Server{db: db, mux: http.NewServeMux()}
    s.mux.HandleFunc("/lookup/", s.handleLookup)
    s.mux.HandleFunc("/batch", s.handleBatch)
    s.mux.HandleFunc("/health", s.handleHealth)
    return s
}

// GET /lookup/{ip}
func (s *Server) handleLookup(w http.ResponseWriter, r *http.Request) {
    ipStr := strings.TrimPrefix(r.URL.Path, "/lookup/")
    ipStr = strings.TrimSpace(ipStr)

    if ipStr == "" {
        ipStr = getClientIP(r) // look up the caller's own IP
    }

    if net.ParseIP(ipStr) == nil {
        http.Error(w, `{"error":"invalid IP address"}`, http.StatusBadRequest)
        return
    }

    info, err := s.db.Lookup(ipStr)
    if err != nil {
        http.Error(w, `{"error":"lookup failed"}`, http.StatusInternalServerError)
        log.Printf("lookup error for %s: %v", ipStr, err)
        return
    }

    w.Header().Set("Content-Type", "application/json")
    w.Header().Set("Cache-Control", "public, max-age=3600") // IP geo changes slowly
    json.NewEncoder(w).Encode(info)
}

// POST /batch (batch lookup)
func (s *Server) handleBatch(w http.ResponseWriter, r *http.Request) {
    if r.Method != http.MethodPost {
        http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
        return
    }

    var ips []string
    if err := json.NewDecoder(r.Body).Decode(&ips); err != nil {
        http.Error(w, `{"error":"invalid JSON"}`, http.StatusBadRequest)
        return
    }

    if len(ips) > 100 {
        http.Error(w, `{"error":"max 100 IPs per batch"}`, http.StatusBadRequest)
        return
    }

    results := make([]*GeoInfo, 0, len(ips))
    for _, ip := range ips {
        if net.ParseIP(ip) == nil {
            results = append(results, &GeoInfo{IP: ip, Country: "INVALID"})
            continue
        }
        info, err := s.db.Lookup(ip)
        if err != nil {
            results = append(results, &GeoInfo{IP: ip, Country: "UNKNOWN"})
            continue
        }
        results = append(results, info)
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(results)
}

// getClientIP extracts the real client IP, respecting reverse-proxy headers
func getClientIP(r *http.Request) string {
    for _, header := range []string{"X-Real-IP", "X-Forwarded-For", "CF-Connecting-IP"} {
        if ip := r.Header.Get(header); ip != "" {
            if idx := strings.IndexByte(ip, ','); idx != -1 {
                ip = ip[:idx] // X-Forwarded-For may contain multiple IPs
            }
            ip = strings.TrimSpace(ip)
            if net.ParseIP(ip) != nil {
                return ip
            }
        }
    }
    host, _, _ := net.SplitHostPort(r.RemoteAddr)
    return host
}

func (s *Server) handleHealth(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]string{
        "status": "ok",
        "time":   time.Now().UTC().Format(time.RFC3339),
    })
}

func main() {
    mmapFile, err := OpenMmap("IP2LOCATION-LITE-DB11.BIN")
    if err != nil {
        log.Fatalf("open database: %v", err)
    }
    defer mmapFile.Close()

    db, err := NewDB(mmapFile)
    if err != nil {
        log.Fatalf("init database: %v", err)
    }

    // Cache the 100,000 most recently seen IPs
    cachedDB, err := NewCachedDB(db, 100_000)
    if err != nil {
        log.Fatalf("init cache: %v", err)
    }

    server := NewServer(cachedDB)

    httpServer := &http.Server{
        Addr:         ":8080",
        Handler:      server.mux,
        ReadTimeout:  5 * time.Second,
        WriteTimeout: 10 * time.Second,
        IdleTimeout:  60 * time.Second,
    }

    log.Println("IP lookup service listening on :8080")
    log.Fatal(httpServer.ListenAndServe())
}

Level 4 · Advanced Topics

CGo: Binding the C Library

MaxMind provides the libmaxminddb C library, which can use SIMD instructions for marginally higher throughput. CGo is Go's mechanism for calling C code:

package maxmind

/*
#cgo LDFLAGS: -lmaxminddb
#include <maxminddb.h>
#include <stdlib.h>
*/
import "C"
import (
    "fmt"
    "unsafe"
)

type MMDBReader struct {
    db C.MMDB_s
}

func Open(path string) (*MMDBReader, error) {
    cPath := C.CString(path)
    defer C.free(unsafe.Pointer(cPath))

    r := &MMDBReader{}
    status := C.MMDB_open(cPath, C.MMDB_MODE_MMAP, &r.db)
    if status != C.MMDB_SUCCESS {
        return nil, fmt.Errorf("MMDB_open: %s", C.GoString(C.MMDB_strerror(status)))
    }
    return r, nil
}

func (r *MMDBReader) Lookup(ipStr string) (map[string]interface{}, error) {
    cIP := C.CString(ipStr)
    defer C.free(unsafe.Pointer(cIP))

    var gaiError, mmdbError C.int
    result := C.MMDB_lookup_string(&r.db, cIP, &gaiError, &mmdbError)

    if gaiError != 0 {
        return nil, fmt.Errorf("getaddrinfo error: %d", gaiError)
    }
    if mmdbError != C.MMDB_SUCCESS {
        return nil, fmt.Errorf("MMDB error: %s", C.GoString(C.MMDB_strerror(mmdbError)))
    }
    if !result.found_entry {
        return nil, nil
    }

    var entryData C.MMDB_entry_data_s
    status := C.MMDB_get_value(&result.entry, &entryData,
        C.CString("country"), C.CString("iso_code"), nil)
    if status == C.MMDB_SUCCESS && entryData.has_data {
        countryCode := C.GoStringN(entryData.utf8_string, C.int(entryData.data_size))
        return map[string]interface{}{"country_code": countryCode}, nil
    }
    return nil, nil
}

func (r *MMDBReader) Close() {
    C.MMDB_close(&r.db)
}

CGo's performance cost: each Go-to-C function call carries ~20–50 ns overhead (thread state management and stack adjustment). For batch queries this is tolerable. For ultra-high-frequency lookup (millions per second), a pure-Go implementation is often faster because it eliminates CGo call overhead entirely.

Deep net.IP and net.IPNet Operations

import "net"

func demonstrateIPNetOps() {
    _, network, _ := net.ParseCIDR("192.168.1.0/24")

    ip := net.ParseIP("192.168.1.100")
    fmt.Println(network.Contains(ip)) // true

    // Compute broadcast address
    mask := network.Mask
    broadcast := make(net.IP, len(network.IP))
    for i := range network.IP {
        broadcast[i] = network.IP[i] | ^mask[i]
    }
    fmt.Println("Network address:", network.IP)
    fmt.Println("Broadcast address:", broadcast)

    // Count IPs in the subnet
    ones, bits := mask.Size()
    count := 1 << uint(bits-ones)
    fmt.Printf("Subnet contains %d IP addresses\n", count)
}

// Detect IP version
func detectIPVersion(ipStr string) string {
    ip := net.ParseIP(ipStr)
    if ip == nil {
        return "invalid"
    }
    if ip.To4() != nil {
        return "IPv4"
    }
    return "IPv6"
}

IP Reputation Lookups: Real-Time Threat Intelligence

IP geolocation is commonly combined with an IP reputation system that detects malicious addresses:

type ReputationLevel int

const (
    ReputationClean     ReputationLevel = iota
    ReputationSuspect                   // suspicious activity
    ReputationMalicious                 // confirmed malicious
    ReputationBanned                    // hard block
)

type ThreatInfo struct {
    IP         string
    Level      ReputationLevel
    Categories []string // "spam", "botnet", "tor-exit", "vpn", "proxy"
    LastSeen   time.Time
    Confidence float32 // 0.0 to 1.0
}

// Local blocklist using a Bloom Filter for memory-efficient membership testing
type IPBlocklist struct {
    filter *bloom.BloomFilter
    mu     sync.RWMutex
}

func (bl *IPBlocklist) IsBlocked(ip string) bool {
    bl.mu.RLock()
    defer bl.mu.RUnlock()
    return bl.filter.TestString(ip)
}

func (bl *IPBlocklist) LoadFromFile(path string) error {
    f, err := os.Open(path)
    if err != nil {
        return err
    }
    defer f.Close()

    bl.mu.Lock()
    defer bl.mu.Unlock()

    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        line := strings.TrimSpace(scanner.Text())
        if line == "" || strings.HasPrefix(line, "#") {
            continue
        }
        bl.filter.AddString(line)
    }
    return scanner.Err()
}

Embedding the Database: go:embed + Compression

For small-scale deployments, embed the entire IP database into the binary:

import (
    "bytes"
    "compress/gzip"
    "embed"
    "io"
)

//go:embed data/IP2LOCATION-LITE-DB1.BIN.gz
var embeddedDB embed.FS

func openEmbeddedDB() (*DB, error) {
    compressedData, err := embeddedDB.ReadFile("data/IP2LOCATION-LITE-DB1.BIN.gz")
    if err != nil {
        return nil, fmt.Errorf("reading embedded database: %w", err)
    }

    gr, err := gzip.NewReader(bytes.NewReader(compressedData))
    if err != nil {
        return nil, err
    }
    defer gr.Close()

    data, err := io.ReadAll(gr)
    if err != nil {
        return nil, err
    }

    return NewDBFromBytes(data)
}

IP2Location LITE DB1 (country-only data) is ~2 MB uncompressed, ~800 KB gzip-compressed. The resulting service ships as a single ~10 MB binary with zero external dependencies.

Rate-Limiting the Lookup API

The IP lookup API itself needs rate limiting to prevent abuse:

import "golang.org/x/time/rate"

type RateLimitedServer struct {
    *Server
    limiters sync.Map // map[clientIP]*rate.Limiter
    rps      float64
    burst    int
}

func (s *RateLimitedServer) getLimiter(clientIP string) *rate.Limiter {
    // LoadOrStore is atomic — no race condition
    actual, _ := s.limiters.LoadOrStore(
        clientIP,
        rate.NewLimiter(rate.Limit(s.rps), s.burst),
    )
    return actual.(*rate.Limiter)
}

func (s *RateLimitedServer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    clientIP := getClientIP(r)
    if !s.getLimiter(clientIP).Allow() {
        w.Header().Set("Retry-After", "1")
        http.Error(w, `{"error":"rate limit exceeded"}`, http.StatusTooManyRequests)
        return
    }
    s.Server.mux.ServeHTTP(w, r)
}

// Periodically evict limiters for idle clients (prevent memory leak)
func (s *RateLimitedServer) cleanupLimiters() {
    for range time.Tick(10 * time.Minute) {
        s.limiters.Range(func(key, value interface{}) bool {
            l := value.(*rate.Limiter)
            // A full bucket means the client has been idle for at least burst/rps seconds
            if l.Tokens() >= float64(s.burst) {
                s.limiters.Delete(key)
            }
            return true
        })
    }
}

Performance Characteristics

A complete Go IP lookup service on modest hardware (4-core CPU, 8 GB RAM) achieves:

Scenario p99 latency Throughput
Single lookup (no cache) ~5 µs ~500,000 QPS
Single lookup (LRU hit) ~0.5 µs ~2,000,000 QPS
HTTP API (localhost) ~0.5 ms ~50,000 QPS
HTTP API (rate-limited) ~0.5 ms ~10,000 QPS

The key optimizations and their contributions:

  1. mmap: converts file I/O to memory access, eliminating system-call overhead
  2. Binary search: O(log n) lookup — 4 million records require only 22 comparisons
  3. LRU cache: repeated queries for hot IPs degrade to O(1)
  4. Fixed-length records: offset arithmetic replaces traversal
  5. String pool: repeated strings (country names) stored once, saving space

The Mental Model: A Transferable Pattern

The architecture presented in this chapter — binary format + memory-mapped file + binary search + cache — transfers to any scenario requiring high-performance read-only lookups:

Go's encoding/binary, syscall.Mmap, and net packages are a complete toolkit for implementing this class of system. The discipline of thinking explicitly about byte layout, endianness, and access patterns — rather than hiding them behind an ORM or a database engine — is what separates systems programmers from application programmers. This chapter is a practical exercise in that discipline.

Rate this chapter
4.6  / 5  (3 ratings)

💬 Comments