IP Lookup Service: Binary Protocol Parsing
IP Lookup Service: Binary Protocol Parsing
Every time you visit a website, the server on the other side can infer your geographic location, your ISP, and sometimes even your device type — all from your IP address. This capability underpins content localization, ad targeting, security enforcement, and regulatory compliance across a vast swath of the modern internet.
IP geolocation is a fascinating engineering problem: how do you find, with minimum latency, the geographic record corresponding to an IP address in a database containing millions of entries? The answer is not in a faster CPU. It is in data structure design and binary format choices.
This chapter uses an IP lookup service as the central example to explain binary file format parsing, memory-mapped files, binary search, LRU caching, and HTTP API design. These techniques combine to deliver microsecond-latency IP lookups — without a database engine.
Level 1 · Use Cases and Background
Why IP Geolocation Matters
Content localization: serve content in the correct language and comply with jurisdiction-specific regulations (e.g., GDPR requirements for EU visitors).
Traffic security: detect anomalous traffic originating from high-risk regions; identify VPN/proxy/Tor exit nodes — a foundational capability for Web Application Firewalls (WAF).
Ad targeting: deliver geographically targeted advertising, one of the core mechanisms of internet advertising.
Compliance enforcement: streaming rights are licensed per region (Netflix's library differs by country). IP geolocation is the technical foundation for region-based access control.
Network diagnostics: determining which Autonomous System (AS) and ISP own a given IP is essential for network operations.
MaxMind GeoIP vs IP2Location
The two most widely used IP database providers:
MaxMind GeoIP2:
- Format: MMDB (MaxMind Database), a proprietary binary format
- Characteristics: high data quality; free GeoLite2 edition (account required)
- Go library:
github.com/oschwald/geoip2-golang - Accuracy: ~80% at city level, ~99% at country level
IP2Location:
- Format: custom binary format with open documentation
- Characteristics: completely free LITE edition; simple format — ideal for understanding binary protocol parsing from first principles
- Accuracy: comparable to MaxMind
This chapter uses IP2Location BIN format as the primary example, because its format is fully publicly documented, making it an ideal vehicle for teaching binary parsing principles.
Text Format vs. Binary Format: Why There Is a 100× Performance Gap
Consider storing IP geolocation data in CSV:
1.0.0.0,1.0.0.255,AU,Australia,Queensland,Brisbane,...
1.0.1.0,1.0.3.255,CN,China,Fujian,Fuzhou,...
Looking up 1.0.2.1 requires:
- Scanning line by line (or loading into memory and parsing row by row)
- Converting the string
"1.0.1.0"to integer16777984for range comparison - Repeated string parsing and memory allocation on every query
Binary format eliminates all of this:
- IP addresses are stored as 4-byte integers (IPv4) or 16-byte integers (IPv6)
- Records are fixed-length, so offset arithmetic replaces traversal
- No parsing — direct memory reads
- Binary search finds the target in O(log n) comparisons
For a database with 4 million records, binary search requires only about 22 comparisons (log₂(4,000,000) ≈ 22), completing each lookup in microseconds.
Level 2 · The IP2Location BIN Format: A Deep Dive
File Layout
An IP2Location BIN file consists of three regions:
┌─────────────────────────────────────────────────────┐
│ File Header (64 bytes) │
├─────────────────────────────────────────────────────┤
│ IPv4 Index Section (optional, for acceleration)│
├─────────────────────────────────────────────────────┤
│ IPv4 Data Records (fixed-length, sorted by IP) │
├─────────────────────────────────────────────────────┤
│ IPv6 Index Section (optional) │
├─────────────────────────────────────────────────────┤
│ IPv6 Data Records │
└─────────────────────────────────────────────────────┘
Parsing the File Header
// IP2Location BIN file header (first 64 bytes)
type Header struct {
DBType uint8 // database type (1=DB1 country, 2=DB2 country+city, ...)
DBColumn uint8 // number of columns per record
DBYear uint8
DBMonth uint8
DBDay uint8
IPv4Count uint32 // number of IPv4 records
IPv4Addr uint32 // file offset of IPv4 data section
IPv6Count uint32
IPv6Addr uint32
IPv4IndexAddr uint32 // file offset of IPv4 index (0 = no index)
IPv6IndexAddr uint32
ProductCode uint8
LicenseCode uint8
DatabaseSize uint32
}
Reading the header using encoding/binary:
import (
"encoding/binary"
"os"
"fmt"
)
func readHeader(f *os.File) (*Header, error) {
var h Header
// IP2Location uses little-endian byte order
if err := binary.Read(f, binary.LittleEndian, &h); err != nil {
return nil, fmt.Errorf("reading header: %w", err)
}
return &h, nil
}
Endianness: Why It Matters
Multi-byte integers can be stored in two ways:
- Big-Endian: the most significant byte is at the lowest address.
0x12345678is stored as[0x12, 0x34, 0x56, 0x78]. Network protocols (TCP/IP) use big-endian — hence the term "network byte order." - Little-Endian: the least significant byte is at the lowest address. The same value is stored as
[0x78, 0x56, 0x34, 0x12]. x86 and ARM CPUs use little-endian (the native byte order on most machines today).
IP2Location uses little-endian for multi-byte integers. Reading with the wrong endianness turns a valid "USA" string offset into a wildly incorrect number, pointing to garbage data or causing a crash.
func demonstrateEndianness() {
data := []byte{0x78, 0x56, 0x34, 0x12}
// Correct: little-endian
leValue := binary.LittleEndian.Uint32(data) // = 0x12345678 = 305419896
// Wrong: big-endian
beValue := binary.BigEndian.Uint32(data) // = 0x78563412 = 2018915346
fmt.Printf("little-endian: %d\n", leValue) // 305419896
fmt.Printf("big-endian: %d\n", beValue) // 2018915346 — wrong!
}
Data Record Format
Each IPv4 record:
┌───────────────┬───────────────┬───────────────────────────────────────────┐
│ IP Start (4B) │ IP End (4B) │ Data Fields (offsets into string pool) │
└───────────────┴───────────────┴───────────────────────────────────────────┘
Strings (country names, city names) are not stored inline. They live in a string pool at the end of the file; the record stores a 4-byte offset into that pool. This means "China" is stored once, no matter how many records reference it.
type Record struct {
IPFrom uint32 // start of IP range (as uint32)
IPTo uint32 // end of IP range
// The following are offsets into the string pool
CountryShort uint32 // offset of country code ("CN")
CountryLong uint32 // offset of full country name ("China")
Region uint32
City uint32
ISP uint32
Latitude float32 // stored directly as IEEE 754 float
Longitude float32
}
// IP2Location string format: [1-byte length][content bytes]
func readString(data []byte, offset uint32) string {
if int(offset) >= len(data) {
return ""
}
length := int(data[offset])
start := int(offset) + 1
end := start + length
if end > len(data) {
return ""
}
return string(data[start:end])
}
Memory-Mapped Files: Zero-Copy Reads
For frequently-read, read-only files, memory mapping (mmap) is the most efficient access mechanism. It maps the file directly into the process's virtual address space. The OS page cache handles on-demand loading, and read() system calls — along with their kernel-to-userspace data copies — are eliminated entirely.
import (
"os"
"syscall"
)
type MmapFile struct {
data []byte
f *os.File
}
func OpenMmap(path string) (*MmapFile, error) {
f, err := os.Open(path)
if err != nil {
return nil, err
}
stat, err := f.Stat()
if err != nil {
f.Close()
return nil, err
}
data, err := syscall.Mmap(
int(f.Fd()),
0,
int(stat.Size()),
syscall.PROT_READ,
syscall.MAP_SHARED,
)
if err != nil {
f.Close()
return nil, fmt.Errorf("mmap: %w", err)
}
return &MmapFile{data: data, f: f}, nil
}
func (m *MmapFile) Close() error {
syscall.Munmap(m.data)
return m.f.Close()
}
// Direct offset-based access — no system call required
func (m *MmapFile) ReadUint32LE(offset int) uint32 {
return binary.LittleEndian.Uint32(m.data[offset:])
}
Three core advantages of mmap over os.File.Read():
- Zero copy: the OS maps disk pages directly into the process address space; no kernel→userspace copy
- Page-cache sharing: when multiple processes open the same file, the kernel maintains a single shared page cache
- Efficient random access: pointer arithmetic instead of
seeksystem calls
Level 3 · Implementing the IP Lookup
IP Address Parsing: IPv4 and IPv6
The net package provides IP address parsing and manipulation:
import "net"
// Convert IPv4 address to uint32 for comparison with database integers
func ipv4ToUint32(ip net.IP) uint32 {
// net.IP can be 4-byte (IPv4) or 16-byte (IPv4-mapped IPv6)
ip = ip.To4()
if ip == nil {
return 0
}
// IP bytes are big-endian
return uint32(ip[0])<<24 | uint32(ip[1])<<16 | uint32(ip[2])<<8 | uint32(ip[3])
}
// Reconstruct net.IP from uint32
func uint32ToIPv4(n uint32) net.IP {
return net.IPv4(byte(n>>24), byte(n>>16), byte(n>>8), byte(n))
}
Pre-computed private IP ranges (RFC 1918 and others):
type PrivateChecker struct {
ranges []*net.IPNet
}
func NewPrivateChecker() *PrivateChecker {
cidrs := []string{
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
"127.0.0.0/8", // loopback
"169.254.0.0/16", // link-local
"::1/128", // IPv6 loopback
"fc00::/7", // IPv6 private
"100.64.0.0/10", // RFC 6598 shared address space
}
pc := &PrivateChecker{}
for _, cidr := range cidrs {
_, network, err := net.ParseCIDR(cidr)
if err == nil {
pc.ranges = append(pc.ranges, network)
}
}
return pc
}
func (pc *PrivateChecker) IsPrivate(ip net.IP) bool {
for _, network := range pc.ranges {
if network.Contains(ip) {
return true
}
}
return false
}
Binary Search: O(log n) IP Range Lookup
The core algorithm is binary search over the sorted array of IP range records:
type DB struct {
data []byte
header *Header
recordSize int
private *PrivateChecker
}
type GeoInfo struct {
IP string `json:"ip"`
CountryCode string `json:"country_code"`
Country string `json:"country"`
Region string `json:"region"`
City string `json:"city"`
ISP string `json:"isp"`
Latitude float32 `json:"latitude"`
Longitude float32 `json:"longitude"`
}
func (db *DB) Lookup(ipStr string) (*GeoInfo, error) {
ip := net.ParseIP(ipStr)
if ip == nil {
return nil, fmt.Errorf("invalid IP address: %s", ipStr)
}
if db.private.IsPrivate(ip) {
return &GeoInfo{IP: ipStr, Country: "PRIVATE", City: "PRIVATE"}, nil
}
ipInt := ipv4ToUint32(ip.To4())
recordOffset, err := db.binarySearch(ipInt)
if err != nil {
return nil, err
}
return db.parseRecord(recordOffset, ipStr), nil
}
func (db *DB) binarySearch(target uint32) (uint32, error) {
count := int(db.header.IPv4Count)
base := int(db.header.IPv4Addr) - 1 // file offset is 1-indexed
low, high := 0, count-1
for low <= high {
mid := (low + high) / 2
recordOffset := base + mid*db.recordSize
ipFrom := binary.LittleEndian.Uint32(db.data[recordOffset:])
ipTo := binary.LittleEndian.Uint32(db.data[recordOffset+4:])
switch {
case target < ipFrom:
high = mid - 1
case target > ipTo:
low = mid + 1
default:
return uint32(recordOffset), nil // found
}
}
return 0, fmt.Errorf("no record for IP %d", target)
}
func (db *DB) parseRecord(offset uint32, ipStr string) *GeoInfo {
data := db.data
base := int(offset) + 8 // skip the 8-byte IP range
countryCodeOffset := binary.LittleEndian.Uint32(data[base:])
countryOffset := binary.LittleEndian.Uint32(data[base+4:])
regionOffset := binary.LittleEndian.Uint32(data[base+8:])
cityOffset := binary.LittleEndian.Uint32(data[base+12:])
ispOffset := binary.LittleEndian.Uint32(data[base+16:])
lat := math.Float32frombits(binary.LittleEndian.Uint32(data[base+20:]))
lon := math.Float32frombits(binary.LittleEndian.Uint32(data[base+24:]))
return &GeoInfo{
IP: ipStr,
CountryCode: readString(data, countryCodeOffset),
Country: readString(data, countryOffset),
Region: readString(data, regionOffset),
City: readString(data, cityOffset),
ISP: readString(data, ispOffset),
Latitude: lat,
Longitude: lon,
}
}
LRU Cache: Accelerating Hot IPs
Although binary search is already fast (microseconds), for scenarios where the same IP is queried repeatedly (e.g., a security system repeatedly checking an attacker's IP), caching reduces CPU overhead further.
LRU (Least Recently Used) evicts the entry that has gone the longest without being accessed:
import "github.com/hashicorp/golang-lru/v2"
type CachedDB struct {
db *DB
cache *lru.Cache[string, *GeoInfo]
}
func NewCachedDB(db *DB, cacheSize int) (*CachedDB, error) {
cache, err := lru.New[string, *GeoInfo](cacheSize)
if err != nil {
return nil, err
}
return &CachedDB{db: db, cache: cache}, nil
}
func (c *CachedDB) Lookup(ipStr string) (*GeoInfo, error) {
if info, ok := c.cache.Get(ipStr); ok {
return info, nil // cache hit
}
info, err := c.db.Lookup(ipStr)
if err != nil {
return nil, err
}
c.cache.Add(ipStr, info)
return info, nil
}
github.com/hashicorp/golang-lru/v2 uses generics, is thread-safe, and implements O(1) reads and writes using a doubly linked list + hash map.
HTTP API Endpoint
package main
import (
"encoding/json"
"log"
"net"
"net/http"
"strings"
"time"
)
type Server struct {
db *CachedDB
mux *http.ServeMux
}
func NewServer(db *CachedDB) *Server {
s := &Server{db: db, mux: http.NewServeMux()}
s.mux.HandleFunc("/lookup/", s.handleLookup)
s.mux.HandleFunc("/batch", s.handleBatch)
s.mux.HandleFunc("/health", s.handleHealth)
return s
}
// GET /lookup/{ip}
func (s *Server) handleLookup(w http.ResponseWriter, r *http.Request) {
ipStr := strings.TrimPrefix(r.URL.Path, "/lookup/")
ipStr = strings.TrimSpace(ipStr)
if ipStr == "" {
ipStr = getClientIP(r) // look up the caller's own IP
}
if net.ParseIP(ipStr) == nil {
http.Error(w, `{"error":"invalid IP address"}`, http.StatusBadRequest)
return
}
info, err := s.db.Lookup(ipStr)
if err != nil {
http.Error(w, `{"error":"lookup failed"}`, http.StatusInternalServerError)
log.Printf("lookup error for %s: %v", ipStr, err)
return
}
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Cache-Control", "public, max-age=3600") // IP geo changes slowly
json.NewEncoder(w).Encode(info)
}
// POST /batch (batch lookup)
func (s *Server) handleBatch(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
return
}
var ips []string
if err := json.NewDecoder(r.Body).Decode(&ips); err != nil {
http.Error(w, `{"error":"invalid JSON"}`, http.StatusBadRequest)
return
}
if len(ips) > 100 {
http.Error(w, `{"error":"max 100 IPs per batch"}`, http.StatusBadRequest)
return
}
results := make([]*GeoInfo, 0, len(ips))
for _, ip := range ips {
if net.ParseIP(ip) == nil {
results = append(results, &GeoInfo{IP: ip, Country: "INVALID"})
continue
}
info, err := s.db.Lookup(ip)
if err != nil {
results = append(results, &GeoInfo{IP: ip, Country: "UNKNOWN"})
continue
}
results = append(results, info)
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(results)
}
// getClientIP extracts the real client IP, respecting reverse-proxy headers
func getClientIP(r *http.Request) string {
for _, header := range []string{"X-Real-IP", "X-Forwarded-For", "CF-Connecting-IP"} {
if ip := r.Header.Get(header); ip != "" {
if idx := strings.IndexByte(ip, ','); idx != -1 {
ip = ip[:idx] // X-Forwarded-For may contain multiple IPs
}
ip = strings.TrimSpace(ip)
if net.ParseIP(ip) != nil {
return ip
}
}
}
host, _, _ := net.SplitHostPort(r.RemoteAddr)
return host
}
func (s *Server) handleHealth(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{
"status": "ok",
"time": time.Now().UTC().Format(time.RFC3339),
})
}
func main() {
mmapFile, err := OpenMmap("IP2LOCATION-LITE-DB11.BIN")
if err != nil {
log.Fatalf("open database: %v", err)
}
defer mmapFile.Close()
db, err := NewDB(mmapFile)
if err != nil {
log.Fatalf("init database: %v", err)
}
// Cache the 100,000 most recently seen IPs
cachedDB, err := NewCachedDB(db, 100_000)
if err != nil {
log.Fatalf("init cache: %v", err)
}
server := NewServer(cachedDB)
httpServer := &http.Server{
Addr: ":8080",
Handler: server.mux,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 60 * time.Second,
}
log.Println("IP lookup service listening on :8080")
log.Fatal(httpServer.ListenAndServe())
}
Level 4 · Advanced Topics
CGo: Binding the C Library
MaxMind provides the libmaxminddb C library, which can use SIMD instructions for marginally higher throughput. CGo is Go's mechanism for calling C code:
package maxmind
/*
#cgo LDFLAGS: -lmaxminddb
#include <maxminddb.h>
#include <stdlib.h>
*/
import "C"
import (
"fmt"
"unsafe"
)
type MMDBReader struct {
db C.MMDB_s
}
func Open(path string) (*MMDBReader, error) {
cPath := C.CString(path)
defer C.free(unsafe.Pointer(cPath))
r := &MMDBReader{}
status := C.MMDB_open(cPath, C.MMDB_MODE_MMAP, &r.db)
if status != C.MMDB_SUCCESS {
return nil, fmt.Errorf("MMDB_open: %s", C.GoString(C.MMDB_strerror(status)))
}
return r, nil
}
func (r *MMDBReader) Lookup(ipStr string) (map[string]interface{}, error) {
cIP := C.CString(ipStr)
defer C.free(unsafe.Pointer(cIP))
var gaiError, mmdbError C.int
result := C.MMDB_lookup_string(&r.db, cIP, &gaiError, &mmdbError)
if gaiError != 0 {
return nil, fmt.Errorf("getaddrinfo error: %d", gaiError)
}
if mmdbError != C.MMDB_SUCCESS {
return nil, fmt.Errorf("MMDB error: %s", C.GoString(C.MMDB_strerror(mmdbError)))
}
if !result.found_entry {
return nil, nil
}
var entryData C.MMDB_entry_data_s
status := C.MMDB_get_value(&result.entry, &entryData,
C.CString("country"), C.CString("iso_code"), nil)
if status == C.MMDB_SUCCESS && entryData.has_data {
countryCode := C.GoStringN(entryData.utf8_string, C.int(entryData.data_size))
return map[string]interface{}{"country_code": countryCode}, nil
}
return nil, nil
}
func (r *MMDBReader) Close() {
C.MMDB_close(&r.db)
}
CGo's performance cost: each Go-to-C function call carries ~20–50 ns overhead (thread state management and stack adjustment). For batch queries this is tolerable. For ultra-high-frequency lookup (millions per second), a pure-Go implementation is often faster because it eliminates CGo call overhead entirely.
Deep net.IP and net.IPNet Operations
import "net"
func demonstrateIPNetOps() {
_, network, _ := net.ParseCIDR("192.168.1.0/24")
ip := net.ParseIP("192.168.1.100")
fmt.Println(network.Contains(ip)) // true
// Compute broadcast address
mask := network.Mask
broadcast := make(net.IP, len(network.IP))
for i := range network.IP {
broadcast[i] = network.IP[i] | ^mask[i]
}
fmt.Println("Network address:", network.IP)
fmt.Println("Broadcast address:", broadcast)
// Count IPs in the subnet
ones, bits := mask.Size()
count := 1 << uint(bits-ones)
fmt.Printf("Subnet contains %d IP addresses\n", count)
}
// Detect IP version
func detectIPVersion(ipStr string) string {
ip := net.ParseIP(ipStr)
if ip == nil {
return "invalid"
}
if ip.To4() != nil {
return "IPv4"
}
return "IPv6"
}
IP Reputation Lookups: Real-Time Threat Intelligence
IP geolocation is commonly combined with an IP reputation system that detects malicious addresses:
type ReputationLevel int
const (
ReputationClean ReputationLevel = iota
ReputationSuspect // suspicious activity
ReputationMalicious // confirmed malicious
ReputationBanned // hard block
)
type ThreatInfo struct {
IP string
Level ReputationLevel
Categories []string // "spam", "botnet", "tor-exit", "vpn", "proxy"
LastSeen time.Time
Confidence float32 // 0.0 to 1.0
}
// Local blocklist using a Bloom Filter for memory-efficient membership testing
type IPBlocklist struct {
filter *bloom.BloomFilter
mu sync.RWMutex
}
func (bl *IPBlocklist) IsBlocked(ip string) bool {
bl.mu.RLock()
defer bl.mu.RUnlock()
return bl.filter.TestString(ip)
}
func (bl *IPBlocklist) LoadFromFile(path string) error {
f, err := os.Open(path)
if err != nil {
return err
}
defer f.Close()
bl.mu.Lock()
defer bl.mu.Unlock()
scanner := bufio.NewScanner(f)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" || strings.HasPrefix(line, "#") {
continue
}
bl.filter.AddString(line)
}
return scanner.Err()
}
Embedding the Database: go:embed + Compression
For small-scale deployments, embed the entire IP database into the binary:
import (
"bytes"
"compress/gzip"
"embed"
"io"
)
//go:embed data/IP2LOCATION-LITE-DB1.BIN.gz
var embeddedDB embed.FS
func openEmbeddedDB() (*DB, error) {
compressedData, err := embeddedDB.ReadFile("data/IP2LOCATION-LITE-DB1.BIN.gz")
if err != nil {
return nil, fmt.Errorf("reading embedded database: %w", err)
}
gr, err := gzip.NewReader(bytes.NewReader(compressedData))
if err != nil {
return nil, err
}
defer gr.Close()
data, err := io.ReadAll(gr)
if err != nil {
return nil, err
}
return NewDBFromBytes(data)
}
IP2Location LITE DB1 (country-only data) is ~2 MB uncompressed, ~800 KB gzip-compressed. The resulting service ships as a single ~10 MB binary with zero external dependencies.
Rate-Limiting the Lookup API
The IP lookup API itself needs rate limiting to prevent abuse:
import "golang.org/x/time/rate"
type RateLimitedServer struct {
*Server
limiters sync.Map // map[clientIP]*rate.Limiter
rps float64
burst int
}
func (s *RateLimitedServer) getLimiter(clientIP string) *rate.Limiter {
// LoadOrStore is atomic — no race condition
actual, _ := s.limiters.LoadOrStore(
clientIP,
rate.NewLimiter(rate.Limit(s.rps), s.burst),
)
return actual.(*rate.Limiter)
}
func (s *RateLimitedServer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
clientIP := getClientIP(r)
if !s.getLimiter(clientIP).Allow() {
w.Header().Set("Retry-After", "1")
http.Error(w, `{"error":"rate limit exceeded"}`, http.StatusTooManyRequests)
return
}
s.Server.mux.ServeHTTP(w, r)
}
// Periodically evict limiters for idle clients (prevent memory leak)
func (s *RateLimitedServer) cleanupLimiters() {
for range time.Tick(10 * time.Minute) {
s.limiters.Range(func(key, value interface{}) bool {
l := value.(*rate.Limiter)
// A full bucket means the client has been idle for at least burst/rps seconds
if l.Tokens() >= float64(s.burst) {
s.limiters.Delete(key)
}
return true
})
}
}
Performance Characteristics
A complete Go IP lookup service on modest hardware (4-core CPU, 8 GB RAM) achieves:
| Scenario | p99 latency | Throughput |
|---|---|---|
| Single lookup (no cache) | ~5 µs | ~500,000 QPS |
| Single lookup (LRU hit) | ~0.5 µs | ~2,000,000 QPS |
| HTTP API (localhost) | ~0.5 ms | ~50,000 QPS |
| HTTP API (rate-limited) | ~0.5 ms | ~10,000 QPS |
The key optimizations and their contributions:
- mmap: converts file I/O to memory access, eliminating system-call overhead
- Binary search: O(log n) lookup — 4 million records require only 22 comparisons
- LRU cache: repeated queries for hot IPs degrade to O(1)
- Fixed-length records: offset arithmetic replaces traversal
- String pool: repeated strings (country names) stored once, saving space
The Mental Model: A Transferable Pattern
The architecture presented in this chapter — binary format + memory-mapped file + binary search + cache — transfers to any scenario requiring high-performance read-only lookups:
- GeoIP databases
- ASN databases
- Certificate Revocation Lists (CRL)
- Security intelligence feeds
- DNS zone files stored in binary format
Go's encoding/binary, syscall.Mmap, and net packages are a complete toolkit for implementing this class of system. The discipline of thinking explicitly about byte layout, endianness, and access patterns — rather than hiding them behind an ORM or a database engine — is what separates systems programmers from application programmers. This chapter is a practical exercise in that discipline.