Chapter 35

Number Theory and Mathematical Foundations

Chapter 35: Number Theory and Mathematical Foundations

You wrote an encryption algorithm that needs to compute 2^1000000007 mod 998244353. If you multiply in a loop a billion times, the program won't finish before the heat death of the universe. But with fast exponentiation, 50 multiplications suffice. You need to check if a number is prime — for numbers on the order of 10^18, trial division requires 10^9 divisions, while Miller-Rabin needs only a few dozen modular exponentiations.

Number theory is not a mathematician's toy. It is the cornerstone of cryptography (RSA relies on the difficulty of factoring large primes), the theoretical guarantee of hash functions (uniform distribution of modular arithmetic), and a core tool in competitive programming (combinatorial counting requires modular inverses). This chapter starts from the most basic GCD and gradually builds a complete number theory toolkit.

Level 1 · What You Need to Know

1.1 Greatest Common Divisor (GCD) and Euclid's Algorithm

Problem Definition: Given two positive integers a and b, find their greatest common divisor gcd(a, b).

Core Theorem: gcd(a, b) = gcd(b, a mod b), and when b = 0, gcd(a, 0) = a.

def gcd(a: int, b: int) -> int:
    """Euclidean algorithm for GCD
    
    Time complexity: O(log(min(a, b)))
    Space complexity: O(1) iterative version
    """
    while b:
        a, b = b, a % b
    return a

# Recursive version (more intuitive but risk of stack overflow)
def gcd_recursive(a: int, b: int) -> int:
    if b == 0:
        return a
    return gcd_recursive(b, a % b)

Complexity Analysis: The number of iterations in Euclid's algorithm is at most O(log(min(a, b))). This is because after every two iterations, the larger number is at least halved. Specifically, if a > b, then a mod b < a/2 (proof: if b ≤ a/2, then a mod b < b ≤ a/2; if b > a/2, then a mod b = a - b < a/2). Therefore at most 2·log₂(min(a, b)) iterations are needed.

Worst case: When a and b are adjacent Fibonacci numbers, Euclid's algorithm reaches its worst case. For example, gcd(F_{n+1}, F_n) requires n-1 iterations because F_{n+1} mod F_n = F_{n-1}. This also proves the complexity is O(log n), since F_n ≈ φ^n / √5, so n ≈ log_φ(F_n).

1.2 Least Common Multiple (LCM)

Definition: lcm(a, b) is the smallest positive integer divisible by both a and b.

Relationship to GCD: lcm(a, b) = a · b / gcd(a, b)

Why does this formula hold? Factor a and b into primes: a = p₁^α₁ · p₂^α₂ · ... , b = p₁^β₁ · p₂^β₂ · .... Then gcd(a, b) = ∏ pᵢ^min(αᵢ, βᵢ), lcm(a, b) = ∏ pᵢ^max(αᵢ, βᵢ). Since min(x, y) + max(x, y) = x + y, we get gcd(a, b) · lcm(a, b) = a · b.

def lcm(a: int, b: int) -> int:
    """Least Common Multiple
    
    Note: divide first, multiply second to prevent overflow
    (in fixed-precision languages)
    """
    return a // gcd(a, b) * b

# LCM of multiple numbers
def lcm_multiple(numbers: list) -> int:
    from functools import reduce
    return reduce(lcm, numbers)

Common mistake: Computing a * b / gcd(a, b) directly may cause integer overflow in C/C++/Java. The correct approach is to divide first: a / gcd(a, b) * b. Python's arbitrary-precision integers don't overflow, but in competitive programming with C++ this is critical.

1.3 Primality Testing

Trial Division: A natural number n > 1 is prime if it cannot be divided by any integer from 2 to √n.

Why only check up to √n? If n = a · b with a ≤ b, then a ≤ √n. So if n has a factor between 1 and n (exclusive), there must be one no greater than √n.

def is_prime(n: int) -> bool:
    """Trial division primality test
    
    Time complexity: O(√n)
    """
    if n < 2:
        return False
    if n < 4:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    # All primes > 3 are of the form 6k±1
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

Why the 6k±1 optimization works: All integers can be written as 6k, 6k+1, 6k+2, 6k+3, 6k+4, or 6k+5. Among these, 6k, 6k+2, 6k+4 are divisible by 2, and 6k+3 is divisible by 3. So primes greater than 3 can only be of the form 6k+1 or 6k+5 (i.e., 6k-1). This skips 2/3 of candidate divisors.

1.4 Sieve of Eratosthenes

Problem: Find all primes from 1 to n.

Idea: Starting from 2, mark all multiples of each prime as composite. Unmarked numbers are prime.

def sieve_of_eratosthenes(n: int) -> list:
    """Sieve of Eratosthenes
    
    Time complexity: O(n log log n)
    Space complexity: O(n)
    """
    if n < 2:
        return []
    
    is_prime = [True] * (n + 1)
    is_prime[0] = is_prime[1] = False
    
    # Only need to sieve up to √n
    i = 2
    while i * i <= n:
        if is_prime[i]:
            # Start marking from i² (i*2, i*3, ..., i*(i-1) already marked)
            for j in range(i * i, n + 1, i):
                is_prime[j] = False
        i += 1
    
    return [i for i in range(2, n + 1) if is_prime[i]]

Why start marking from i²? When processing prime p, the numbers p·2, p·3, ..., p·(p-1) have already been marked when processing 2, 3, ..., p-1. For example, when p=5: 5·2=10 (marked by 2), 5·3=15 (marked by 3), 5·4=20 (marked by 2). The first new number to mark is 5·5=25.

Complexity analysis: Total operations are ∑{p≤n, p prime} n/p = n · ∑{p≤n} 1/p. By Mertens' theorem (Mertens, 1874), the sum of reciprocals of primes ∑_{p≤n} 1/p = ln(ln n) + M (M is the Meissel-Mertens constant ≈ 0.2615). Thus total complexity is O(n log log n).

1.5 Linear Sieve (Euler's Sieve)

The problem with the Sieve of Eratosthenes is that a composite number may be marked multiple times. For example, 30 gets marked by 2, 3, and 5. The linear sieve ensures each composite is marked exactly once by its smallest prime factor, achieving strict O(n) time.

def linear_sieve(n: int) -> list:
    """Linear Sieve (Euler's Sieve)
    
    Time complexity: O(n)
    Space complexity: O(n)
    
    Core idea: each composite m is marked only once by its
    smallest prime factor lpf(m)
    """
    is_prime = [True] * (n + 1)
    primes = []
    
    for i in range(2, n + 1):
        if is_prime[i]:
            primes.append(i)
        
        for p in primes:
            if i * p > n:
                break
            is_prime[i * p] = False
            # Key: stop when p divides i
            # Because for subsequent p' > p, the smallest prime
            # factor of i*p' is p (since p|i), and should be
            # marked when processing (i*p'/p) * p
            if i % p == 0:
                break
    
    return primes

Why if i % p == 0: break guarantees linearity: Suppose i = p · k. Then for any prime q > p in the primes list, i · q = p · k · q. The smallest prime factor of this number is p (since p|i, so p|i·q), not q. If we continue marking i · q, this number would be marked twice — once now (incorrectly attributing q as smallest prime factor) and once in the future when iterating to k · q (correctly marked by p). So we must stop when p|i.

Linear Sieve vs Sieve of Eratosthenes in practice:

Property	Eratosthenes	Linear Sieve
Time complexity	O(n log log n)	O(n)
Constant factor	Small (cache-friendly)	Large (poor branch prediction)
Actual speed (n=10⁸)	About the same	About the same
Additional info	None	Gives smallest prime factor
Code complexity	Simple	Slightly complex

In practice for n ≤ 10⁸, both are similarly fast. The linear sieve's advantage is providing each number's smallest prime factor, useful for fast factorization.

1.6 Common Errors and Pitfalls

Error 1: GCD argument order

Euclid's algorithm doesn't require a > b. If a < b, the first step gives a % b = a (since a < b), resulting in gcd(b, a) — the arguments are automatically swapped.

Error 2: LCM overflow

# Dangerous in C++ (not an issue in Python):
# long long lcm = a * b / gcd(a, b);  // a*b may overflow!
# Correct:
# long long lcm = a / gcd(a, b) * b;  // divide first

Error 3: Array bounds in sieve

# Wrong: range should go to n+1
is_prime = [True] * n  # should be n+1
# Accessing is_prime[n] causes out-of-bounds

Error 4: Treating 1 as prime

1 is neither prime nor composite. This convention exists because the Fundamental Theorem of Arithmetic requires unique factorization — if 1 were prime, then 6 = 2 × 3 = 1 × 2 × 3 = 1 × 1 × 2 × 3... would not be unique.

Level 2 · How It Works Under the Hood

2.1 Modular Exponentiation (Fast Power)

Problem: Compute a^n mod m, where n can be very large (e.g., 10^18).

Idea: Use binary expansion to reduce n multiplications to O(log n).

a^13 = a^(1101₂) = a^8 · a^4 · a^1

Each squaring doubles the exponent while checking if n's current lowest bit is 1.

def power_mod(base: int, exp: int, mod: int) -> int:
    """Modular exponentiation (fast power)
    
    Computes base^exp % mod
    Time complexity: O(log exp)
    Space complexity: O(1)
    """
    result = 1
    base %= mod
    
    while exp > 0:
        if exp & 1:
            result = result * base % mod
        base = base * base % mod
        exp >>= 1
    
    return result

Execution trace: Computing 2^13 mod 1000

Step	exp (binary)	base	result	Operation
Init	1101	2	1	-
1	110	4	2	exp&1=1, result*=base; base²=base
2	11	16	2	exp&1=0; base²=base
3	1	256	32	exp&1=1, result*=base; base²=base
4	0	65536	8192	exp&1=1, result*=base; done

Verify: 2^13 = 8192, 8192 mod 1000 = 192. Correct.

Why take mod at every step? Without modular reduction, intermediate results grow exponentially — 2^(10^18) has more digits than atoms in the universe. Taking mod at each step keeps intermediate results below mod² (product of two numbers less than mod), making the computation feasible in finite precision.

2.2 Extended Euclidean Algorithm

Problem: Given a, b, find integers x, y such that ax + by = gcd(a, b).

Bézout's Identity: For any integers a, b, there exist integers x, y such that ax + by = gcd(a, b). This is the theoretical basis for the extended Euclidean algorithm.

def extended_gcd(a: int, b: int) -> tuple:
    """Extended Euclidean Algorithm
    
    Returns (g, x, y) such that ax + by = g = gcd(a, b)
    Time complexity: O(log(min(a, b)))
    """
    if b == 0:
        return a, 1, 0  # a*1 + 0*0 = a
    
    g, x1, y1 = extended_gcd(b, a % b)
    # From b*x1 + (a%b)*y1 = g
    # i.e., b*x1 + (a - ⌊a/b⌋*b)*y1 = g
    # i.e., a*y1 + b*(x1 - ⌊a/b⌋*y1) = g
    x = y1
    y = x1 - (a // b) * y1
    return g, x, y

Derivation: Suppose we already know gcd(b, a%b) = g with b·x₁ + (a%b)·y₁ = g.

Since a%b = a - ⌊a/b⌋·b, substituting:

b·x₁ + (a - ⌊a/b⌋·b)·y₁ = g
b·x₁ + a·y₁ - ⌊a/b⌋·b·y₁ = g
a·y₁ + b·(x₁ - ⌊a/b⌋·y₁) = g

Comparing with a·x + b·y = g: x = y₁, y = x₁ - ⌊a/b⌋·y₁.

Iterative version (avoids stack overflow):

def extended_gcd_iterative(a: int, b: int) -> tuple:
    """Extended Euclidean Algorithm - iterative version"""
    old_r, r = a, b
    old_s, s = 1, 0
    old_t, t = 0, 1
    
    while r != 0:
        quotient = old_r // r
        old_r, r = r, old_r - quotient * r
        old_s, s = s, old_s - quotient * s
        old_t, t = t, old_t - quotient * t
    
    return old_r, old_s, old_t

2.3 Modular Inverse

Definition: The modular inverse of a modulo m is x satisfying a·x ≡ 1 (mod m), denoted a⁻¹ mod m.

Existence condition: a⁻¹ mod m exists if and only if gcd(a, m) = 1 (i.e., a and m are coprime).

Method 1: Extended Euclidean

From ax + my = gcd(a, m) = 1, we get ax ≡ 1 (mod m), so x is the modular inverse.

def mod_inverse(a: int, m: int) -> int:
    """Modular inverse via extended Euclidean algorithm
    
    Precondition: gcd(a, m) = 1
    Returns x such that a*x ≡ 1 (mod m)
    """
    g, x, _ = extended_gcd(a, m)
    if g != 1:
        raise ValueError(f"{a} and {m} are not coprime; inverse doesn't exist")
    return x % m  # ensure positive result

Method 2: Fermat's Little Theorem (when m is prime)

Fermat's Little Theorem: If p is prime and gcd(a, p) = 1, then a^(p-1) ≡ 1 (mod p).

Therefore a · a^(p-2) ≡ 1 (mod p), i.e., a⁻¹ ≡ a^(p-2) (mod p).

def mod_inverse_fermat(a: int, p: int) -> int:
    """Modular inverse via Fermat's Little Theorem (only when p is prime)
    
    a^(-1) ≡ a^(p-2) (mod p)
    """
    return power_mod(a, p - 2, p)

Comparison:

Method	Condition	Time	Advantage
Extended Euclidean	gcd(a, m) = 1	O(log m)	General; m need not be prime
Fermat's theorem	m is prime	O(log m)	Concise code, smaller constant

2.4 Combinatorics and Lucas' Theorem

Computing C(n, k):

Method 1 (small scale): Pascal's Triangle

def comb_pascal(n: int, k: int) -> int:
    """Pascal's Triangle method
    
    Time: O(nk), Space: O(k)
    """
    if k > n - k:
        k = n - k
    
    dp = [0] * (k + 1)
    dp[0] = 1
    
    for i in range(1, n + 1):
        for j in range(min(i, k), 0, -1):
            dp[j] = dp[j] + dp[j - 1]
    
    return dp[k]

Method 2 (modular, large scale): Precompute factorials and their inverses

class CombMod:
    """O(1) binomial coefficient queries mod p after O(n) preprocessing"""
    
    def __init__(self, n: int, mod: int):
        self.mod = mod
        self.fact = [1] * (n + 1)
        self.inv_fact = [1] * (n + 1)
        
        for i in range(1, n + 1):
            self.fact[i] = self.fact[i - 1] * i % mod
        
        self.inv_fact[n] = power_mod(self.fact[n], mod - 2, mod)
        for i in range(n - 1, -1, -1):
            self.inv_fact[i] = self.inv_fact[i + 1] * (i + 1) % mod
    
    def comb(self, n: int, k: int) -> int:
        """Compute C(n, k) mod p"""
        if k < 0 or k > n:
            return 0
        return self.fact[n] * self.inv_fact[k] % self.mod * self.inv_fact[n - k] % self.mod

Lucas' Theorem: When p is prime and n, k may be very large (but p is small):

C(n, k) mod p = C(n mod p, k mod p) · C(n // p, k // p) mod p

def lucas(n: int, k: int, p: int) -> int:
    """Lucas' Theorem for C(n, k) mod p
    
    Requires: p is prime
    Use when: n, k are large but p is small (e.g., p ≤ 10⁵)
    Time: O(p + log_p(n) · log p)
    """
    fact = [1] * p
    for i in range(1, p):
        fact[i] = fact[i - 1] * i % p
    
    inv_fact = [1] * p
    inv_fact[p - 1] = power_mod(fact[p - 1], p - 2, p)
    for i in range(p - 2, -1, -1):
        inv_fact[i] = inv_fact[i + 1] * (i + 1) % p
    
    def small_comb(n, k):
        if k < 0 or k > n:
            return 0
        return fact[n] * inv_fact[k] % p * inv_fact[n - k] % p
    
    result = 1
    while n > 0 or k > 0:
        ni, ki = n % p, k % p
        result = result * small_comb(ni, ki) % p
        n //= p
        k //= p
    
    return result

Intuition behind Lucas' Theorem: Think of n and k as base-p numbers. C(n, k) mod p equals the product of the binomial coefficients of each digit pair. This is analogous to "digit-by-digit" processing in base 10, but applied to binomial coefficients modulo a prime.

2.5 Variants and Applications of Fast Exponentiation

Matrix exponentiation: Accelerates linear recurrences.

def matrix_mult(A: list, B: list, mod: int) -> list:
    """Matrix multiplication mod"""
    n = len(A)
    m = len(B[0])
    k = len(B)
    C = [[0] * m for _ in range(n)]
    for i in range(n):
        for j in range(m):
            for l in range(k):
                C[i][j] = (C[i][j] + A[i][l] * B[l][j]) % mod
    return C

def matrix_power(M: list, n: int, mod: int) -> list:
    """Matrix fast exponentiation
    
    Computes M^n mod p
    Used to accelerate Fibonacci and other linear recurrences
    """
    size = len(M)
    result = [[1 if i == j else 0 for j in range(size)] for i in range(size)]
    
    while n > 0:
        if n & 1:
            result = matrix_mult(result, M, mod)
        M = matrix_mult(M, M, mod)
        n >>= 1
    
    return result

def fibonacci_fast(n: int, mod: int = 10**9 + 7) -> int:
    """O(log n) computation of the nth Fibonacci number
    
    [F(n+1), F(n)] = [[1,1],[1,0]]^n · [1, 0]
    """
    if n <= 1:
        return n
    M = [[1, 1], [1, 0]]
    result = matrix_power(M, n, mod)
    return result[0][1]

Applications:

Fibonacci number at position 10^18 (mod p)
Linear recurrence f(n) = a₁f(n-1) + a₂f(n-2) + ... + aₖf(n-k)
Number of paths of exactly k steps in a graph (k-th power of adjacency matrix)

2.6 Prime Distribution and the Prime Number Theorem

Prime Number Theorem (Hadamard and de la Vallée-Poussin, independently proved in 1896):

π(n) ~ n / ln(n)

where π(n) is the number of primes not exceeding n. More precisely:

lim_{n→∞} π(n) / (n / ln n) = 1

Practical implications: This tells us there are approximately n/ln(n) primes in [1, n]. For example:

n = 10⁶: approximately 72,382 primes (exact: 78,498)
n = 10⁹: approximately 48,254,942 primes (exact: 50,847,534)

The practical meaning: near n, on average one in every ln(n) consecutive integers is prime. This means:

If you need a 1024-bit prime (for RSA), you only need to test about 1024·ln(2) ≈ 710 random odd numbers on average.

Level 3 · What the Specifications Say

3.1 History of Euclid's Algorithm (circa 300 BCE)

Euclid's algorithm appears in Book VII of Euclid's Elements (Propositions 1 and 2), circa 300 BCE. This makes it one of the oldest non-trivial algorithms in human history — predating the "mutual subtraction" method in China's Nine Chapters on the Mathematical Art (circa 200 CE) by about 500 years.

Original formulation: Euclid did not use the modulo operation but rather repeated subtraction. Given two line segments AB and CD (AB > CD), repeatedly subtract CD's length from AB until the remainder is less than CD, then swap roles and continue. This is mathematically equivalent to modulo (a mod b = a - ⌊a/b⌋·b), but the modular version converges faster.

Chinese "mutual subtraction" method: The Nine Chapters records a method essentially identical to Euclid's subtraction version: "If both are even, halve them; otherwise, subtract the smaller from the larger repeatedly until they are equal." It first tries dividing both by 2, then uses repeated subtraction.

Lamé's Theorem (1844): Gabriel Lamé proved that the number of steps in Euclid's algorithm is at most 5 times the number of digits in the input. Specifically, if a > b ≥ 1, the number of divisions is at most 5·log₁₀(b) + 1. This is one of the earliest complexity proofs in the history of algorithm analysis.

Core of Lamé's proof: He showed that the smallest inputs requiring n steps are consecutive Fibonacci numbers (F_{n+2}, F_{n+1}). Since F_n ≈ φ^n/√5 (φ = (1+√5)/2 ≈ 1.618), the number of steps n ≈ log_φ(b) = log₁₀(b) / log₁₀(φ) ≈ 2.078·log₁₀(b), giving the upper bound 5·log₁₀(b).

3.2 Fermat's Little Theorem

Theorem (Pierre de Fermat, stated in a 1640 letter; first proved by Euler in 1736):

If p is prime and a is not divisible by p, then a^(p-1) ≡ 1 (mod p).

Equivalent form: For any integer a, a^p ≡ a (mod p).

Proof (group-theoretic): Consider the nonzero residues {1, 2, ..., p-1} modulo p, which form a group of order p-1 under multiplication (denoted (Z/pZ)*). By Lagrange's theorem, the order of every element divides the group order, i.e., ord(a) | (p-1). Therefore a^(p-1) = (a^ord(a))^((p-1)/ord(a)) = 1^((p-1)/ord(a)) = 1.

Combinatorial proof (without group theory): Consider a, 2a, 3a, ..., (p-1)a modulo p. They are pairwise distinct (otherwise ia ≡ ja (mod p) means (i-j)a ≡ 0 (mod p), but p∤a and 0 < |i-j| < p, contradiction). So they form a permutation of 1, 2, ..., p-1. Therefore:

a · 2a · 3a · ... · (p-1)a ≡ 1 · 2 · 3 · ... · (p-1) (mod p)

a^(p-1) · (p-1)! ≡ (p-1)! (mod p)

Since (p-1)! is coprime to p, we can cancel to get a^(p-1) ≡ 1 (mod p).

Applications:

Computing modular inverse: a⁻¹ ≡ a^(p-2) (mod p)
Primality testing (Fermat test): If a^(n-1) ≢ 1 (mod n), then n is definitely composite
Caveat: Carmichael numbers exist (e.g., 561 = 3·11·17) — they are composite but satisfy a^(n-1) ≡ 1 (mod n) for all a coprime to n

3.3 Chinese Remainder Theorem (CRT)

Theorem (Sunzi Suanjing, circa 3rd-5th century CE):

Let m₁, m₂, ..., mₖ be pairwise coprime. Then the system of congruences

x ≡ a₁ (mod m₁) x ≡ a₂ (mod m₂) ... x ≡ aₖ (mod mₖ)

has a unique solution modulo M = m₁·m₂·...·mₖ.

Historical context: The original problem in Sunzi Suanjing: "There are things whose number is unknown. When divided by 3, remainder is 2; by 5, remainder is 3; by 7, remainder is 2. What is the number?" (Answer: 23)

Constructive proof:

Let M = m₁·m₂·...·mₖ and Mᵢ = M/mᵢ.

Since gcd(Mᵢ, mᵢ) = 1, there exists tᵢ (the inverse of Mᵢ mod mᵢ) with Mᵢ · tᵢ ≡ 1 (mod mᵢ).

The solution is: x = Σᵢ aᵢ · Mᵢ · tᵢ (mod M)

Verification: Taking x mod mⱼ, all terms except i = j vanish (their Mᵢ contains factor mⱼ), leaving aⱼ · 1 = aⱼ.

def chinese_remainder_theorem(remainders: list, moduli: list) -> int:
    """Chinese Remainder Theorem
    
    Input: remainders = [a1, a2, ..., ak]
           moduli = [m1, m2, ..., mk] (pairwise coprime)
    Output: x such that x ≡ ai (mod mi) for all i
    
    Time: O(k · log(max(mi)))
    """
    M = 1
    for m in moduli:
        M *= m
    
    x = 0
    for ai, mi in zip(remainders, moduli):
        Mi = M // mi
        _, ti, _ = extended_gcd(Mi, mi)
        x += ai * Mi * ti
    
    return x % M

Practical example:

# Sunzi Suanjing problem
# x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7)
result = chinese_remainder_theorem([2, 3, 2], [3, 5, 7])
print(result)  # 23
# Verify: 23 % 3 = 2, 23 % 5 = 3, 23 % 7 = 2 ✓

CRT in real systems:

CRT-RSA: Instead of computing c^d mod n (n is 2048-bit), compute c^d mod p and c^d mod q (each 1024-bit) separately, then combine with CRT. This is ~4x faster.
Big integer multiplication: Choose small primes p₁, p₂, ..., multiply mod each pᵢ, then reconstruct with CRT.
Multi-modulus technique in contests: When answers may be huge, compute mod 3 large primes separately, then reconstruct with CRT.

3.4 Euler's Theorem and Euler's Totient Function

Euler's totient φ(n): The count of positive integers less than n that are coprime to n.

def euler_phi(n: int) -> int:
    """Euler's totient function
    
    φ(n) = n · ∏(1 - 1/p) for all distinct prime factors p of n
    Time: O(√n)
    """
    result = n
    p = 2
    while p * p <= n:
        if n % p == 0:
            while n % p == 0:
                n //= p
            result -= result // p
        p += 1
    if n > 1:
        result -= result // n
    return result

Euler's Theorem (Euler, 1763): If gcd(a, n) = 1, then a^φ(n) ≡ 1 (mod n).

This generalizes Fermat's Little Theorem: when n = p is prime, φ(p) = p-1, reducing to Fermat's theorem.

Properties of Euler's totient:

φ(1) = 1
φ(p) = p - 1 (p prime)
φ(p^k) = p^k - p^(k-1) = p^(k-1)·(p-1)
φ(mn) = φ(m)·φ(n) when gcd(m,n) = 1 (multiplicative function)
Σ_{d|n} φ(d) = n (sum of totient over all divisors equals n)

3.5 Miller-Rabin Primality Test

For large numbers (e.g., 10^18), trial division is too slow. Miller-Rabin is a probabilistic primality test based on the following observation:

Setup: If p is an odd prime, write p-1 = 2^s · d (d odd). For any 1 < a < p:

Either a^d ≡ 1 (mod p)
Or there exists 0 ≤ r < s such that a^(2^r · d) ≡ -1 (mod p)

(This follows from Fermat's theorem a^(p-1) ≡ 1 and the fact that the only square roots of 1 mod p are ±1)

If n fails this test for some a, then n is definitely composite.

def miller_rabin(n: int, witnesses: list = None) -> bool:
    """Miller-Rabin primality test
    
    For n < 3,317,044,064,679,887,385,961,981,
    witnesses = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
    gives a deterministic result (no probability of error).
    """
    if n < 2:
        return False
    if n == 2 or n == 3:
        return True
    if n % 2 == 0:
        return False
    
    # Write n-1 = 2^s * d
    s, d = 0, n - 1
    while d % 2 == 0:
        s += 1
        d //= 2
    
    if witnesses is None:
        witnesses = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
    
    for a in witnesses:
        if a >= n:
            continue
        
        x = power_mod(a, d, n)
        
        if x == 1 or x == n - 1:
            continue
        
        for _ in range(s - 1):
            x = x * x % n
            if x == n - 1:
                break
        else:
            return False
    
    return True

Deterministic guarantee: Gary Miller (1976) proved that under the Generalized Riemann Hypothesis (GRH), testing a = 2, 3, ..., 2(ln n)² suffices for deterministic primality. In practice, for 64-bit integers (n < 2^64), a fixed set of 12 witnesses is sufficient.

Level 4 · Edge Cases and Interview Problems

4.1 Interview Problem: Count Primes (LeetCode #204)

Problem: Given integer n, return the count of all primes less than n.

Analysis: Direct application of the Sieve of Eratosthenes. Note: "less than n", not "at most n".

def count_primes(n: int) -> int:
    """LeetCode 204: Count Primes
    
    Time: O(n log log n)
    Space: O(n)
    """
    if n <= 2:
        return 0
    
    is_prime = [True] * n
    is_prime[0] = is_prime[1] = False
    
    i = 2
    while i * i < n:
        if is_prime[i]:
            for j in range(i * i, n, i):
                is_prime[j] = False
        i += 1
    
    return sum(is_prime)

Follow-up questions in interviews:

"Can we use only O(√n) space?" — Segmented Sieve
"What if n = 10^10?" — Meissel-Lehmer algorithm, O(n^(2/3)) time
"How to test a single large number?" — Miller-Rabin

Optimization: Use a bitset instead of bool array to save 8x memory. In Python, use bytearray:

def count_primes_optimized(n: int) -> int:
    """Optimized: only process odd numbers"""
    if n <= 2:
        return 0
    if n <= 3:
        return 1
    
    # Only store odds: sieve[i] represents whether 2i+3 is prime
    size = (n - 3) // 2 + 1
    sieve = bytearray(b'\x01') * size
    
    i = 0
    while (2 * i + 3) ** 2 < n:
        if sieve[i]:
            p = 2 * i + 3
            start = (p * p - 3) // 2
            sieve[start::p] = bytearray(len(sieve[start::p]))
        i += 1
    
    return sum(sieve) + 1  # +1 for prime 2

4.2 Interview Problem: Happy Number (LeetCode #202)

Problem: Determine if a number n is a "happy number." A happy number is defined by replacing the number with the sum of squares of its digits, repeating until it reaches 1 (happy) or loops endlessly (not happy).

Number-theoretic analysis: The key insight is the process must enter a cycle (for a d-digit number, the next value is at most 81d, so the range is finite). Moreover, unhappy numbers always enter the cycle containing 4: 4 → 16 → 37 → 58 → 89 → 145 → 42 → 20 → 4.

def is_happy(n: int) -> bool:
    """LeetCode 202: Happy Number
    
    Method 1: Floyd's cycle detection
    Time: O(log n) (practically O(1) since values quickly shrink below 1000)
    Space: O(1)
    """
    def digit_square_sum(num):
        total = 0
        while num:
            num, digit = divmod(num, 10)
            total += digit * digit
        return total
    
    slow = n
    fast = digit_square_sum(n)
    
    while fast != 1 and slow != fast:
        slow = digit_square_sum(slow)
        fast = digit_square_sum(digit_square_sum(fast))
    
    return fast == 1

def is_happy_v2(n: int) -> bool:
    """Method 2: Check if we hit 4"""
    def digit_square_sum(num):
        total = 0
        while num:
            num, digit = divmod(num, 10)
            total += digit * digit
        return total
    
    while n != 1 and n != 4:
        n = digit_square_sum(n)
    
    return n == 1

Why convergence is fast: For n ≤ 999 (3 digits), the next value is at most 9²+9²+9² = 243. For n ≤ 9999999 (7 digits), the next value is at most 7×81 = 567. So regardless of initial value, within a few steps it drops below 1000, then cycles within a finite set.

4.3 Interview Problem: Pow(x, n) (LeetCode #50)

Problem: Implement pow(x, n) — compute x raised to the power n.

Key pitfalls:

n can be negative → x^(-n) = 1 / x^n
n can be -2^31 (taking absolute value overflows int32)
x = 0 and n < 0 → undefined (but problem guarantees this won't happen)

def my_pow(x: float, n: int) -> float:
    """LeetCode 50: Pow(x, n)
    
    Fast exponentiation with negative exponent handling
    Time: O(log |n|)
    Space: O(1)
    """
    if n == 0:
        return 1.0
    
    if n < 0:
        x = 1.0 / x
        n = -n
    
    result = 1.0
    current = x
    
    while n > 0:
        if n & 1:
            result *= current
        current *= current
        n >>= 1
    
    return result

Interview discussion points:

Why not use n = abs(n)? Because abs(-2^31) overflows in 32-bit integers (int32 range is [-2^31, 2^31 - 1])
Python doesn't have this issue (big integers), but C++/Java need to cast to long first
Recursive version is more readable but has stack overhead

4.4 Interview Problem: Trailing Zeroes (LeetCode #172)

Problem: Given integer n, return the number of trailing zeroes in n!.

Number-theoretic analysis: Trailing zeroes come from factors of 10 = 2 × 5. In n!, the count of factor 2 far exceeds factor 5 (every other even number contributes at least one 2, but only every 5th number contributes a 5). So trailing zeroes = total count of factor 5 in n!.

Legendre's Formula: The exponent of prime p in n! is Σ_{i=1}^{∞} ⌊n/p^i⌋

def trailing_zeroes(n: int) -> int:
    """LeetCode 172: Factorial Trailing Zeroes
    
    Count trailing zeros = count factor 5 in n!
    Time: O(log₅ n)
    Space: O(1)
    """
    count = 0
    power_of_5 = 5
    while power_of_5 <= n:
        count += n // power_of_5
        power_of_5 *= 5
    return count

# More concise version
def trailing_zeroes_v2(n: int) -> int:
    count = 0
    while n >= 5:
        n //= 5
        count += n
    return count

Intuitive explanation:

⌊n/5⌋ counts multiples of 5 (each contributes one 5)
⌊n/25⌋ counts multiples of 25 (each contributes an extra 5)
⌊n/125⌋ counts multiples of 125 (yet another extra 5)
...

Example n = 100: ⌊100/5⌋ + ⌊100/25⌋ + ⌊100/125⌋ = 20 + 4 + 0 = 24. So 100! has 24 trailing zeroes.

Advanced variant: "Trailing zeroes of n! in base k" — factor k into primes, use Legendre's formula for each, take the minimum.

4.5 Common Number Theory Pitfalls in Contests

Pitfall 1: Negative numbers in modular arithmetic

# Python: -7 % 3 = 2 (always non-negative) — correct mathematical definition
# C/C++: -7 % 3 = -1 (preserves sign) — needs manual fix
# Safe mod in C++: ((a % m) + m) % m

Pitfall 2: Modular arithmetic doesn't distribute over division

# Wrong: (a / b) % m ≠ (a % m) / (b % m)
# Correct: (a / b) % m = (a % m) * inverse(b, m) % m

# Example: (10 / 2) % 7 = 5
# Wrong: (10 % 7) / (2 % 7) = 3 / 2 = 1.5 ??
# Correct: (10 % 7) * mod_inverse(2, 7) % 7 = 3 * 4 % 7 = 5
# (since 2 * 4 = 8 ≡ 1 (mod 7), so 2⁻¹ ≡ 4 (mod 7))

Pitfall 3: Fast power with base=0 or mod=1

def power_mod_safe(base, exp, mod):
    if mod == 1:
        return 0  # anything mod 1 = 0
    if base == 0:
        return 0 if exp > 0 else 1  # 0^0 usually defined as 1
    # ... normal fast power

Pitfall 4: C(n, k) when k > n

# Must return 0 when k > n; don't let factorial compute negative values
def safe_comb(n, k, mod):
    if k < 0 or k > n:
        return 0
    # ... normal computation

4.6 Complete Number Theory Toolkit for Interviews

class NumberTheoryToolbox:
    """Complete number theory toolkit for contests/interviews"""
    
    def __init__(self, mod: int = 10**9 + 7):
        self.mod = mod
    
    def power(self, base: int, exp: int) -> int:
        result = 1
        base %= self.mod
        while exp > 0:
            if exp & 1:
                result = result * base % self.mod
            base = base * base % self.mod
            exp >>= 1
        return result
    
    def inverse(self, a: int) -> int:
        return self.power(a, self.mod - 2)
    
    def factorial_and_inverse(self, n: int) -> tuple:
        fact = [1] * (n + 1)
        for i in range(1, n + 1):
            fact[i] = fact[i - 1] * i % self.mod
        
        inv_fact = [1] * (n + 1)
        inv_fact[n] = self.power(fact[n], self.mod - 2)
        for i in range(n - 1, -1, -1):
            inv_fact[i] = inv_fact[i + 1] * (i + 1) % self.mod
        
        return fact, inv_fact
    
    def comb(self, n: int, k: int, fact: list, inv_fact: list) -> int:
        if k < 0 or k > n:
            return 0
        return fact[n] * inv_fact[k] % self.mod * inv_fact[n - k] % self.mod
    
    @staticmethod
    def gcd(a: int, b: int) -> int:
        while b:
            a, b = b, a % b
        return a
    
    @staticmethod
    def lcm(a: int, b: int) -> int:
        return a // NumberTheoryToolbox.gcd(a, b) * b
    
    @staticmethod
    def extended_gcd(a: int, b: int) -> tuple:
        if b == 0:
            return a, 1, 0
        g, x, y = NumberTheoryToolbox.extended_gcd(b, a % b)
        return g, y, x - (a // b) * y
    
    @staticmethod
    def crt(remainders: list, moduli: list) -> int:
        M = 1
        for m in moduli:
            M *= m
        result = 0
        for a, m in zip(remainders, moduli):
            Mi = M // m
            _, t, _ = NumberTheoryToolbox.extended_gcd(Mi, m)
            result += a * Mi * t
        return result % M

4.7 Interview Response Strategy

When encountering number theory problems in interviews:

Confirm bounds first: The range of n determines algorithm choice
- n ≤ 10⁶: Sieve / brute force
- n ≤ 10⁷: Linear sieve
- n ≤ 10^18: Fast power / Miller-Rabin / Pollard Rho
Three principles of modular arithmetic:
- Addition, subtraction, multiplication can take mod anytime
- Division must be converted to multiplication by inverse
- Take mod at every step to prevent overflow
Common pattern recognition:
- "Result modulo 10^9+7" → Need modular inverse, fast power
- "Compute binomial coefficient" → Preprocess factorials + Lucas
- "Greatest common divisor" → Euclid's algorithm
- "Congruence equations" → Extended Euclidean / CRT
Complexity quick reference:
- gcd: O(log n)
- Fast power: O(log n)
- Sieve of Eratosthenes [1,n]: O(n log log n)
- Linear sieve [1,n]: O(n)
- Trial division: O(√n)
- Miller-Rabin: O(k · log²n), k = number of witnesses

Rate this chapter

4.8 / 5 (3 ratings)