Number Theory and Mathematical Foundations
Chapter 35: Number Theory and Mathematical Foundations
You wrote an encryption algorithm that needs to compute 2^1000000007 mod 998244353. If you multiply in a loop a billion times, the program won't finish before the heat death of the universe. But with fast exponentiation, 50 multiplications suffice. You need to check if a number is prime — for numbers on the order of 10^18, trial division requires 10^9 divisions, while Miller-Rabin needs only a few dozen modular exponentiations.
Number theory is not a mathematician's toy. It is the cornerstone of cryptography (RSA relies on the difficulty of factoring large primes), the theoretical guarantee of hash functions (uniform distribution of modular arithmetic), and a core tool in competitive programming (combinatorial counting requires modular inverses). This chapter starts from the most basic GCD and gradually builds a complete number theory toolkit.
Level 1 · What You Need to Know
1.1 Greatest Common Divisor (GCD) and Euclid's Algorithm
Problem Definition: Given two positive integers a and b, find their greatest common divisor gcd(a, b).
Core Theorem: gcd(a, b) = gcd(b, a mod b), and when b = 0, gcd(a, 0) = a.
Why does this theorem hold? Let d = gcd(a, b), so d|a and d|b. Since a mod b = a - ⌊a/b⌋·b, and d|a and d|b, we have d|(a mod b). In the reverse direction, d|b and d|(a mod b) implies d|(a mod b + ⌊a/b⌋·b) = d|a. Therefore the set of common divisors of (a, b) and (b, a mod b) are identical, so their GCDs are equal.
def gcd(a: int, b: int) -> int:
"""Euclidean algorithm for GCD
Time complexity: O(log(min(a, b)))
Space complexity: O(1) iterative version
"""
while b:
a, b = b, a % b
return a
# Recursive version (more intuitive but risk of stack overflow)
def gcd_recursive(a: int, b: int) -> int:
if b == 0:
return a
return gcd_recursive(b, a % b)
Complexity Analysis: The number of iterations in Euclid's algorithm is at most O(log(min(a, b))). This is because after every two iterations, the larger number is at least halved. Specifically, if a > b, then a mod b < a/2 (proof: if b ≤ a/2, then a mod b < b ≤ a/2; if b > a/2, then a mod b = a - b < a/2). Therefore at most 2·log₂(min(a, b)) iterations are needed.
Worst case: When a and b are adjacent Fibonacci numbers, Euclid's algorithm reaches its worst case. For example, gcd(F_{n+1}, F_n) requires n-1 iterations because F_{n+1} mod F_n = F_{n-1}. This also proves the complexity is O(log n), since F_n ≈ φ^n / √5, so n ≈ log_φ(F_n).
1.2 Least Common Multiple (LCM)
Definition: lcm(a, b) is the smallest positive integer divisible by both a and b.
Relationship to GCD: lcm(a, b) = a · b / gcd(a, b)
Why does this formula hold? Factor a and b into primes: a = p₁^α₁ · p₂^α₂ · ... , b = p₁^β₁ · p₂^β₂ · .... Then gcd(a, b) = ∏ pᵢ^min(αᵢ, βᵢ), lcm(a, b) = ∏ pᵢ^max(αᵢ, βᵢ). Since min(x, y) + max(x, y) = x + y, we get gcd(a, b) · lcm(a, b) = a · b.
def lcm(a: int, b: int) -> int:
"""Least Common Multiple
Note: divide first, multiply second to prevent overflow
(in fixed-precision languages)
"""
return a // gcd(a, b) * b
# LCM of multiple numbers
def lcm_multiple(numbers: list) -> int:
from functools import reduce
return reduce(lcm, numbers)
Common mistake: Computing a * b / gcd(a, b) directly may cause integer overflow in C/C++/Java. The correct approach is to divide first: a / gcd(a, b) * b. Python's arbitrary-precision integers don't overflow, but in competitive programming with C++ this is critical.
1.3 Primality Testing
Trial Division: A natural number n > 1 is prime if it cannot be divided by any integer from 2 to √n.
Why only check up to √n? If n = a · b with a ≤ b, then a ≤ √n. So if n has a factor between 1 and n (exclusive), there must be one no greater than √n.
def is_prime(n: int) -> bool:
"""Trial division primality test
Time complexity: O(√n)
"""
if n < 2:
return False
if n < 4:
return True
if n % 2 == 0 or n % 3 == 0:
return False
# All primes > 3 are of the form 6k±1
i = 5
while i * i <= n:
if n % i == 0 or n % (i + 2) == 0:
return False
i += 6
return True
Why the 6k±1 optimization works: All integers can be written as 6k, 6k+1, 6k+2, 6k+3, 6k+4, or 6k+5. Among these, 6k, 6k+2, 6k+4 are divisible by 2, and 6k+3 is divisible by 3. So primes greater than 3 can only be of the form 6k+1 or 6k+5 (i.e., 6k-1). This skips 2/3 of candidate divisors.
1.4 Sieve of Eratosthenes
Problem: Find all primes from 1 to n.
Idea: Starting from 2, mark all multiples of each prime as composite. Unmarked numbers are prime.
def sieve_of_eratosthenes(n: int) -> list:
"""Sieve of Eratosthenes
Time complexity: O(n log log n)
Space complexity: O(n)
"""
if n < 2:
return []
is_prime = [True] * (n + 1)
is_prime[0] = is_prime[1] = False
# Only need to sieve up to √n
i = 2
while i * i <= n:
if is_prime[i]:
# Start marking from i² (i*2, i*3, ..., i*(i-1) already marked)
for j in range(i * i, n + 1, i):
is_prime[j] = False
i += 1
return [i for i in range(2, n + 1) if is_prime[i]]
Why start marking from i²? When processing prime p, the numbers p·2, p·3, ..., p·(p-1) have already been marked when processing 2, 3, ..., p-1. For example, when p=5: 5·2=10 (marked by 2), 5·3=15 (marked by 3), 5·4=20 (marked by 2). The first new number to mark is 5·5=25.
Complexity analysis: Total operations are ∑{p≤n, p prime} n/p = n · ∑{p≤n} 1/p. By Mertens' theorem (Mertens, 1874), the sum of reciprocals of primes ∑_{p≤n} 1/p = ln(ln n) + M (M is the Meissel-Mertens constant ≈ 0.2615). Thus total complexity is O(n log log n).
1.5 Linear Sieve (Euler's Sieve)
The problem with the Sieve of Eratosthenes is that a composite number may be marked multiple times. For example, 30 gets marked by 2, 3, and 5. The linear sieve ensures each composite is marked exactly once by its smallest prime factor, achieving strict O(n) time.
def linear_sieve(n: int) -> list:
"""Linear Sieve (Euler's Sieve)
Time complexity: O(n)
Space complexity: O(n)
Core idea: each composite m is marked only once by its
smallest prime factor lpf(m)
"""
is_prime = [True] * (n + 1)
primes = []
for i in range(2, n + 1):
if is_prime[i]:
primes.append(i)
for p in primes:
if i * p > n:
break
is_prime[i * p] = False
# Key: stop when p divides i
# Because for subsequent p' > p, the smallest prime
# factor of i*p' is p (since p|i), and should be
# marked when processing (i*p'/p) * p
if i % p == 0:
break
return primes
Why if i % p == 0: break guarantees linearity: Suppose i = p · k. Then for any prime q > p in the primes list, i · q = p · k · q. The smallest prime factor of this number is p (since p|i, so p|i·q), not q. If we continue marking i · q, this number would be marked twice — once now (incorrectly attributing q as smallest prime factor) and once in the future when iterating to k · q (correctly marked by p). So we must stop when p|i.
Linear Sieve vs Sieve of Eratosthenes in practice:
| Property | Eratosthenes | Linear Sieve |
|---|---|---|
| Time complexity | O(n log log n) | O(n) |
| Constant factor | Small (cache-friendly) | Large (poor branch prediction) |
| Actual speed (n=10⁸) | About the same | About the same |
| Additional info | None | Gives smallest prime factor |
| Code complexity | Simple | Slightly complex |
In practice for n ≤ 10⁸, both are similarly fast. The linear sieve's advantage is providing each number's smallest prime factor, useful for fast factorization.
1.6 Common Errors and Pitfalls
Error 1: GCD argument order
Euclid's algorithm doesn't require a > b. If a < b, the first step gives a % b = a (since a < b), resulting in gcd(b, a) — the arguments are automatically swapped.
Error 2: LCM overflow
# Dangerous in C++ (not an issue in Python):
# long long lcm = a * b / gcd(a, b); // a*b may overflow!
# Correct:
# long long lcm = a / gcd(a, b) * b; // divide first
Error 3: Array bounds in sieve
# Wrong: range should go to n+1
is_prime = [True] * n # should be n+1
# Accessing is_prime[n] causes out-of-bounds
Error 4: Treating 1 as prime
1 is neither prime nor composite. This convention exists because the Fundamental Theorem of Arithmetic requires unique factorization — if 1 were prime, then 6 = 2 × 3 = 1 × 2 × 3 = 1 × 1 × 2 × 3... would not be unique.
Level 2 · How It Works Under the Hood
2.1 Modular Exponentiation (Fast Power)
Problem: Compute a^n mod m, where n can be very large (e.g., 10^18).
Idea: Use binary expansion to reduce n multiplications to O(log n).
a^13 = a^(1101₂) = a^8 · a^4 · a^1
Each squaring doubles the exponent while checking if n's current lowest bit is 1.
def power_mod(base: int, exp: int, mod: int) -> int:
"""Modular exponentiation (fast power)
Computes base^exp % mod
Time complexity: O(log exp)
Space complexity: O(1)
"""
result = 1
base %= mod
while exp > 0:
if exp & 1:
result = result * base % mod
base = base * base % mod
exp >>= 1
return result
Execution trace: Computing 2^13 mod 1000
| Step | exp (binary) | base | result | Operation |
|---|---|---|---|---|
| Init | 1101 | 2 | 1 | - |
| 1 | 110 | 4 | 2 | exp&1=1, result*=base; base²=base |
| 2 | 11 | 16 | 2 | exp&1=0; base²=base |
| 3 | 1 | 256 | 32 | exp&1=1, result*=base; base²=base |
| 4 | 0 | 65536 | 8192 | exp&1=1, result*=base; done |
Verify: 2^13 = 8192, 8192 mod 1000 = 192. Correct.
Why take mod at every step? Without modular reduction, intermediate results grow exponentially — 2^(10^18) has more digits than atoms in the universe. Taking mod at each step keeps intermediate results below mod² (product of two numbers less than mod), making the computation feasible in finite precision.
2.2 Extended Euclidean Algorithm
Problem: Given a, b, find integers x, y such that ax + by = gcd(a, b).
Bézout's Identity: For any integers a, b, there exist integers x, y such that ax + by = gcd(a, b). This is the theoretical basis for the extended Euclidean algorithm.
def extended_gcd(a: int, b: int) -> tuple:
"""Extended Euclidean Algorithm
Returns (g, x, y) such that ax + by = g = gcd(a, b)
Time complexity: O(log(min(a, b)))
"""
if b == 0:
return a, 1, 0 # a*1 + 0*0 = a
g, x1, y1 = extended_gcd(b, a % b)
# From b*x1 + (a%b)*y1 = g
# i.e., b*x1 + (a - ⌊a/b⌋*b)*y1 = g
# i.e., a*y1 + b*(x1 - ⌊a/b⌋*y1) = g
x = y1
y = x1 - (a // b) * y1
return g, x, y
Derivation: Suppose we already know gcd(b, a%b) = g with b·x₁ + (a%b)·y₁ = g.
Since a%b = a - ⌊a/b⌋·b, substituting:
- b·x₁ + (a - ⌊a/b⌋·b)·y₁ = g
- b·x₁ + a·y₁ - ⌊a/b⌋·b·y₁ = g
- a·y₁ + b·(x₁ - ⌊a/b⌋·y₁) = g
Comparing with a·x + b·y = g: x = y₁, y = x₁ - ⌊a/b⌋·y₁.
Iterative version (avoids stack overflow):
def extended_gcd_iterative(a: int, b: int) -> tuple:
"""Extended Euclidean Algorithm - iterative version"""
old_r, r = a, b
old_s, s = 1, 0
old_t, t = 0, 1
while r != 0:
quotient = old_r // r
old_r, r = r, old_r - quotient * r
old_s, s = s, old_s - quotient * s
old_t, t = t, old_t - quotient * t
return old_r, old_s, old_t
2.3 Modular Inverse
Definition: The modular inverse of a modulo m is x satisfying a·x ≡ 1 (mod m), denoted a⁻¹ mod m.
Existence condition: a⁻¹ mod m exists if and only if gcd(a, m) = 1 (i.e., a and m are coprime).
Method 1: Extended Euclidean
From ax + my = gcd(a, m) = 1, we get ax ≡ 1 (mod m), so x is the modular inverse.
def mod_inverse(a: int, m: int) -> int:
"""Modular inverse via extended Euclidean algorithm
Precondition: gcd(a, m) = 1
Returns x such that a*x ≡ 1 (mod m)
"""
g, x, _ = extended_gcd(a, m)
if g != 1:
raise ValueError(f"{a} and {m} are not coprime; inverse doesn't exist")
return x % m # ensure positive result
Method 2: Fermat's Little Theorem (when m is prime)
Fermat's Little Theorem: If p is prime and gcd(a, p) = 1, then a^(p-1) ≡ 1 (mod p).
Therefore a · a^(p-2) ≡ 1 (mod p), i.e., a⁻¹ ≡ a^(p-2) (mod p).
def mod_inverse_fermat(a: int, p: int) -> int:
"""Modular inverse via Fermat's Little Theorem (only when p is prime)
a^(-1) ≡ a^(p-2) (mod p)
"""
return power_mod(a, p - 2, p)
Comparison:
| Method | Condition | Time | Advantage |
|---|---|---|---|
| Extended Euclidean | gcd(a, m) = 1 | O(log m) | General; m need not be prime |
| Fermat's theorem | m is prime | O(log m) | Concise code, smaller constant |
2.4 Combinatorics and Lucas' Theorem
Computing C(n, k):
Method 1 (small scale): Pascal's Triangle
def comb_pascal(n: int, k: int) -> int:
"""Pascal's Triangle method
Time: O(nk), Space: O(k)
"""
if k > n - k:
k = n - k
dp = [0] * (k + 1)
dp[0] = 1
for i in range(1, n + 1):
for j in range(min(i, k), 0, -1):
dp[j] = dp[j] + dp[j - 1]
return dp[k]
Method 2 (modular, large scale): Precompute factorials and their inverses
class CombMod:
"""O(1) binomial coefficient queries mod p after O(n) preprocessing"""
def __init__(self, n: int, mod: int):
self.mod = mod
self.fact = [1] * (n + 1)
self.inv_fact = [1] * (n + 1)
for i in range(1, n + 1):
self.fact[i] = self.fact[i - 1] * i % mod
self.inv_fact[n] = power_mod(self.fact[n], mod - 2, mod)
for i in range(n - 1, -1, -1):
self.inv_fact[i] = self.inv_fact[i + 1] * (i + 1) % mod
def comb(self, n: int, k: int) -> int:
"""Compute C(n, k) mod p"""
if k < 0 or k > n:
return 0
return self.fact[n] * self.inv_fact[k] % self.mod * self.inv_fact[n - k] % self.mod
Lucas' Theorem: When p is prime and n, k may be very large (but p is small):
C(n, k) mod p = C(n mod p, k mod p) · C(n // p, k // p) mod p
def lucas(n: int, k: int, p: int) -> int:
"""Lucas' Theorem for C(n, k) mod p
Requires: p is prime
Use when: n, k are large but p is small (e.g., p ≤ 10⁵)
Time: O(p + log_p(n) · log p)
"""
fact = [1] * p
for i in range(1, p):
fact[i] = fact[i - 1] * i % p
inv_fact = [1] * p
inv_fact[p - 1] = power_mod(fact[p - 1], p - 2, p)
for i in range(p - 2, -1, -1):
inv_fact[i] = inv_fact[i + 1] * (i + 1) % p
def small_comb(n, k):
if k < 0 or k > n:
return 0
return fact[n] * inv_fact[k] % p * inv_fact[n - k] % p
result = 1
while n > 0 or k > 0:
ni, ki = n % p, k % p
result = result * small_comb(ni, ki) % p
n //= p
k //= p
return result
Intuition behind Lucas' Theorem: Think of n and k as base-p numbers. C(n, k) mod p equals the product of the binomial coefficients of each digit pair. This is analogous to "digit-by-digit" processing in base 10, but applied to binomial coefficients modulo a prime.
2.5 Variants and Applications of Fast Exponentiation
Matrix exponentiation: Accelerates linear recurrences.
def matrix_mult(A: list, B: list, mod: int) -> list:
"""Matrix multiplication mod"""
n = len(A)
m = len(B[0])
k = len(B)
C = [[0] * m for _ in range(n)]
for i in range(n):
for j in range(m):
for l in range(k):
C[i][j] = (C[i][j] + A[i][l] * B[l][j]) % mod
return C
def matrix_power(M: list, n: int, mod: int) -> list:
"""Matrix fast exponentiation
Computes M^n mod p
Used to accelerate Fibonacci and other linear recurrences
"""
size = len(M)
result = [[1 if i == j else 0 for j in range(size)] for i in range(size)]
while n > 0:
if n & 1:
result = matrix_mult(result, M, mod)
M = matrix_mult(M, M, mod)
n >>= 1
return result
def fibonacci_fast(n: int, mod: int = 10**9 + 7) -> int:
"""O(log n) computation of the nth Fibonacci number
[F(n+1), F(n)] = [[1,1],[1,0]]^n · [1, 0]
"""
if n <= 1:
return n
M = [[1, 1], [1, 0]]
result = matrix_power(M, n, mod)
return result[0][1]
Applications:
- Fibonacci number at position 10^18 (mod p)
- Linear recurrence f(n) = a₁f(n-1) + a₂f(n-2) + ... + aₖf(n-k)
- Number of paths of exactly k steps in a graph (k-th power of adjacency matrix)
2.6 Prime Distribution and the Prime Number Theorem
Prime Number Theorem (Hadamard and de la Vallée-Poussin, independently proved in 1896):
π(n) ~ n / ln(n)
where π(n) is the number of primes not exceeding n. More precisely:
lim_{n→∞} π(n) / (n / ln n) = 1
Practical implications: This tells us there are approximately n/ln(n) primes in [1, n]. For example:
- n = 10⁶: approximately 72,382 primes (exact: 78,498)
- n = 10⁹: approximately 48,254,942 primes (exact: 50,847,534)
The practical meaning: near n, on average one in every ln(n) consecutive integers is prime. This means:
- If you need a 1024-bit prime (for RSA), you only need to test about 1024·ln(2) ≈ 710 random odd numbers on average.
Level 3 · What the Specifications Say
3.1 History of Euclid's Algorithm (circa 300 BCE)
Euclid's algorithm appears in Book VII of Euclid's Elements (Propositions 1 and 2), circa 300 BCE. This makes it one of the oldest non-trivial algorithms in human history — predating the "mutual subtraction" method in China's Nine Chapters on the Mathematical Art (circa 200 CE) by about 500 years.
Original formulation: Euclid did not use the modulo operation but rather repeated subtraction. Given two line segments AB and CD (AB > CD), repeatedly subtract CD's length from AB until the remainder is less than CD, then swap roles and continue. This is mathematically equivalent to modulo (a mod b = a - ⌊a/b⌋·b), but the modular version converges faster.
Chinese "mutual subtraction" method: The Nine Chapters records a method essentially identical to Euclid's subtraction version: "If both are even, halve them; otherwise, subtract the smaller from the larger repeatedly until they are equal." It first tries dividing both by 2, then uses repeated subtraction.
Lamé's Theorem (1844): Gabriel Lamé proved that the number of steps in Euclid's algorithm is at most 5 times the number of digits in the input. Specifically, if a > b ≥ 1, the number of divisions is at most 5·log₁₀(b) + 1. This is one of the earliest complexity proofs in the history of algorithm analysis.
Core of Lamé's proof: He showed that the smallest inputs requiring n steps are consecutive Fibonacci numbers (F_{n+2}, F_{n+1}). Since F_n ≈ φ^n/√5 (φ = (1+√5)/2 ≈ 1.618), the number of steps n ≈ log_φ(b) = log₁₀(b) / log₁₀(φ) ≈ 2.078·log₁₀(b), giving the upper bound 5·log₁₀(b).
3.2 Fermat's Little Theorem
Theorem (Pierre de Fermat, stated in a 1640 letter; first proved by Euler in 1736):
If p is prime and a is not divisible by p, then a^(p-1) ≡ 1 (mod p).
Equivalent form: For any integer a, a^p ≡ a (mod p).
Proof (group-theoretic): Consider the nonzero residues {1, 2, ..., p-1} modulo p, which form a group of order p-1 under multiplication (denoted (Z/pZ)*). By Lagrange's theorem, the order of every element divides the group order, i.e., ord(a) | (p-1). Therefore a^(p-1) = (a^ord(a))^((p-1)/ord(a)) = 1^((p-1)/ord(a)) = 1.
Combinatorial proof (without group theory): Consider a, 2a, 3a, ..., (p-1)a modulo p. They are pairwise distinct (otherwise ia ≡ ja (mod p) means (i-j)a ≡ 0 (mod p), but p∤a and 0 < |i-j| < p, contradiction). So they form a permutation of 1, 2, ..., p-1. Therefore:
a · 2a · 3a · ... · (p-1)a ≡ 1 · 2 · 3 · ... · (p-1) (mod p)
a^(p-1) · (p-1)! ≡ (p-1)! (mod p)
Since (p-1)! is coprime to p, we can cancel to get a^(p-1) ≡ 1 (mod p).
Applications:
- Computing modular inverse: a⁻¹ ≡ a^(p-2) (mod p)
- Primality testing (Fermat test): If a^(n-1) ≢ 1 (mod n), then n is definitely composite
- Caveat: Carmichael numbers exist (e.g., 561 = 3·11·17) — they are composite but satisfy a^(n-1) ≡ 1 (mod n) for all a coprime to n
3.3 Chinese Remainder Theorem (CRT)
Theorem (Sunzi Suanjing, circa 3rd-5th century CE):
Let m₁, m₂, ..., mₖ be pairwise coprime. Then the system of congruences
x ≡ a₁ (mod m₁) x ≡ a₂ (mod m₂) ... x ≡ aₖ (mod mₖ)
has a unique solution modulo M = m₁·m₂·...·mₖ.
Historical context: The original problem in Sunzi Suanjing: "There are things whose number is unknown. When divided by 3, remainder is 2; by 5, remainder is 3; by 7, remainder is 2. What is the number?" (Answer: 23)
Constructive proof:
Let M = m₁·m₂·...·mₖ and Mᵢ = M/mᵢ.
Since gcd(Mᵢ, mᵢ) = 1, there exists tᵢ (the inverse of Mᵢ mod mᵢ) with Mᵢ · tᵢ ≡ 1 (mod mᵢ).
The solution is: x = Σᵢ aᵢ · Mᵢ · tᵢ (mod M)
Verification: Taking x mod mⱼ, all terms except i = j vanish (their Mᵢ contains factor mⱼ), leaving aⱼ · 1 = aⱼ.
def chinese_remainder_theorem(remainders: list, moduli: list) -> int:
"""Chinese Remainder Theorem
Input: remainders = [a1, a2, ..., ak]
moduli = [m1, m2, ..., mk] (pairwise coprime)
Output: x such that x ≡ ai (mod mi) for all i
Time: O(k · log(max(mi)))
"""
M = 1
for m in moduli:
M *= m
x = 0
for ai, mi in zip(remainders, moduli):
Mi = M // mi
_, ti, _ = extended_gcd(Mi, mi)
x += ai * Mi * ti
return x % M
Practical example:
# Sunzi Suanjing problem
# x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7)
result = chinese_remainder_theorem([2, 3, 2], [3, 5, 7])
print(result) # 23
# Verify: 23 % 3 = 2, 23 % 5 = 3, 23 % 7 = 2 ✓
CRT in real systems:
-
CRT-RSA: Instead of computing c^d mod n (n is 2048-bit), compute c^d mod p and c^d mod q (each 1024-bit) separately, then combine with CRT. This is ~4x faster.
-
Big integer multiplication: Choose small primes p₁, p₂, ..., multiply mod each pᵢ, then reconstruct with CRT.
-
Multi-modulus technique in contests: When answers may be huge, compute mod 3 large primes separately, then reconstruct with CRT.
3.4 Euler's Theorem and Euler's Totient Function
Euler's totient φ(n): The count of positive integers less than n that are coprime to n.
def euler_phi(n: int) -> int:
"""Euler's totient function
φ(n) = n · ∏(1 - 1/p) for all distinct prime factors p of n
Time: O(√n)
"""
result = n
p = 2
while p * p <= n:
if n % p == 0:
while n % p == 0:
n //= p
result -= result // p
p += 1
if n > 1:
result -= result // n
return result
Euler's Theorem (Euler, 1763): If gcd(a, n) = 1, then a^φ(n) ≡ 1 (mod n).
This generalizes Fermat's Little Theorem: when n = p is prime, φ(p) = p-1, reducing to Fermat's theorem.
Properties of Euler's totient:
- φ(1) = 1
- φ(p) = p - 1 (p prime)
- φ(p^k) = p^k - p^(k-1) = p^(k-1)·(p-1)
- φ(mn) = φ(m)·φ(n) when gcd(m,n) = 1 (multiplicative function)
- Σ_{d|n} φ(d) = n (sum of totient over all divisors equals n)
3.5 Miller-Rabin Primality Test
For large numbers (e.g., 10^18), trial division is too slow. Miller-Rabin is a probabilistic primality test based on the following observation:
Setup: If p is an odd prime, write p-1 = 2^s · d (d odd). For any 1 < a < p:
- Either a^d ≡ 1 (mod p)
- Or there exists 0 ≤ r < s such that a^(2^r · d) ≡ -1 (mod p)
(This follows from Fermat's theorem a^(p-1) ≡ 1 and the fact that the only square roots of 1 mod p are ±1)
If n fails this test for some a, then n is definitely composite.
def miller_rabin(n: int, witnesses: list = None) -> bool:
"""Miller-Rabin primality test
For n < 3,317,044,064,679,887,385,961,981,
witnesses = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
gives a deterministic result (no probability of error).
"""
if n < 2:
return False
if n == 2 or n == 3:
return True
if n % 2 == 0:
return False
# Write n-1 = 2^s * d
s, d = 0, n - 1
while d % 2 == 0:
s += 1
d //= 2
if witnesses is None:
witnesses = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
for a in witnesses:
if a >= n:
continue
x = power_mod(a, d, n)
if x == 1 or x == n - 1:
continue
for _ in range(s - 1):
x = x * x % n
if x == n - 1:
break
else:
return False
return True
Deterministic guarantee: Gary Miller (1976) proved that under the Generalized Riemann Hypothesis (GRH), testing a = 2, 3, ..., 2(ln n)² suffices for deterministic primality. In practice, for 64-bit integers (n < 2^64), a fixed set of 12 witnesses is sufficient.
Level 4 · Edge Cases and Interview Problems
4.1 Interview Problem: Count Primes (LeetCode #204)
Problem: Given integer n, return the count of all primes less than n.
Analysis: Direct application of the Sieve of Eratosthenes. Note: "less than n", not "at most n".
def count_primes(n: int) -> int:
"""LeetCode 204: Count Primes
Time: O(n log log n)
Space: O(n)
"""
if n <= 2:
return 0
is_prime = [True] * n
is_prime[0] = is_prime[1] = False
i = 2
while i * i < n:
if is_prime[i]:
for j in range(i * i, n, i):
is_prime[j] = False
i += 1
return sum(is_prime)
Follow-up questions in interviews:
- "Can we use only O(√n) space?" — Segmented Sieve
- "What if n = 10^10?" — Meissel-Lehmer algorithm, O(n^(2/3)) time
- "How to test a single large number?" — Miller-Rabin
Optimization: Use a bitset instead of bool array to save 8x memory. In Python, use bytearray:
def count_primes_optimized(n: int) -> int:
"""Optimized: only process odd numbers"""
if n <= 2:
return 0
if n <= 3:
return 1
# Only store odds: sieve[i] represents whether 2i+3 is prime
size = (n - 3) // 2 + 1
sieve = bytearray(b'\x01') * size
i = 0
while (2 * i + 3) ** 2 < n:
if sieve[i]:
p = 2 * i + 3
start = (p * p - 3) // 2
sieve[start::p] = bytearray(len(sieve[start::p]))
i += 1
return sum(sieve) + 1 # +1 for prime 2
4.2 Interview Problem: Happy Number (LeetCode #202)
Problem: Determine if a number n is a "happy number." A happy number is defined by replacing the number with the sum of squares of its digits, repeating until it reaches 1 (happy) or loops endlessly (not happy).
Number-theoretic analysis: The key insight is the process must enter a cycle (for a d-digit number, the next value is at most 81d, so the range is finite). Moreover, unhappy numbers always enter the cycle containing 4: 4 → 16 → 37 → 58 → 89 → 145 → 42 → 20 → 4.
def is_happy(n: int) -> bool:
"""LeetCode 202: Happy Number
Method 1: Floyd's cycle detection
Time: O(log n) (practically O(1) since values quickly shrink below 1000)
Space: O(1)
"""
def digit_square_sum(num):
total = 0
while num:
num, digit = divmod(num, 10)
total += digit * digit
return total
slow = n
fast = digit_square_sum(n)
while fast != 1 and slow != fast:
slow = digit_square_sum(slow)
fast = digit_square_sum(digit_square_sum(fast))
return fast == 1
def is_happy_v2(n: int) -> bool:
"""Method 2: Check if we hit 4"""
def digit_square_sum(num):
total = 0
while num:
num, digit = divmod(num, 10)
total += digit * digit
return total
while n != 1 and n != 4:
n = digit_square_sum(n)
return n == 1
Why convergence is fast: For n ≤ 999 (3 digits), the next value is at most 9²+9²+9² = 243. For n ≤ 9999999 (7 digits), the next value is at most 7×81 = 567. So regardless of initial value, within a few steps it drops below 1000, then cycles within a finite set.
4.3 Interview Problem: Pow(x, n) (LeetCode #50)
Problem: Implement pow(x, n) — compute x raised to the power n.
Key pitfalls:
- n can be negative → x^(-n) = 1 / x^n
- n can be -2^31 (taking absolute value overflows int32)
- x = 0 and n < 0 → undefined (but problem guarantees this won't happen)
def my_pow(x: float, n: int) -> float:
"""LeetCode 50: Pow(x, n)
Fast exponentiation with negative exponent handling
Time: O(log |n|)
Space: O(1)
"""
if n == 0:
return 1.0
if n < 0:
x = 1.0 / x
n = -n
result = 1.0
current = x
while n > 0:
if n & 1:
result *= current
current *= current
n >>= 1
return result
Interview discussion points:
- Why not use
n = abs(n)? Because abs(-2^31) overflows in 32-bit integers (int32 range is [-2^31, 2^31 - 1]) - Python doesn't have this issue (big integers), but C++/Java need to cast to long first
- Recursive version is more readable but has stack overhead
4.4 Interview Problem: Trailing Zeroes (LeetCode #172)
Problem: Given integer n, return the number of trailing zeroes in n!.
Number-theoretic analysis: Trailing zeroes come from factors of 10 = 2 × 5. In n!, the count of factor 2 far exceeds factor 5 (every other even number contributes at least one 2, but only every 5th number contributes a 5). So trailing zeroes = total count of factor 5 in n!.
Legendre's Formula: The exponent of prime p in n! is Σ_{i=1}^{∞} ⌊n/p^i⌋
def trailing_zeroes(n: int) -> int:
"""LeetCode 172: Factorial Trailing Zeroes
Count trailing zeros = count factor 5 in n!
Time: O(log₅ n)
Space: O(1)
"""
count = 0
power_of_5 = 5
while power_of_5 <= n:
count += n // power_of_5
power_of_5 *= 5
return count
# More concise version
def trailing_zeroes_v2(n: int) -> int:
count = 0
while n >= 5:
n //= 5
count += n
return count
Intuitive explanation:
- ⌊n/5⌋ counts multiples of 5 (each contributes one 5)
- ⌊n/25⌋ counts multiples of 25 (each contributes an extra 5)
- ⌊n/125⌋ counts multiples of 125 (yet another extra 5)
- ...
Example n = 100: ⌊100/5⌋ + ⌊100/25⌋ + ⌊100/125⌋ = 20 + 4 + 0 = 24. So 100! has 24 trailing zeroes.
Advanced variant: "Trailing zeroes of n! in base k" — factor k into primes, use Legendre's formula for each, take the minimum.
4.5 Common Number Theory Pitfalls in Contests
Pitfall 1: Negative numbers in modular arithmetic
# Python: -7 % 3 = 2 (always non-negative) — correct mathematical definition
# C/C++: -7 % 3 = -1 (preserves sign) — needs manual fix
# Safe mod in C++: ((a % m) + m) % m
Pitfall 2: Modular arithmetic doesn't distribute over division
# Wrong: (a / b) % m ≠ (a % m) / (b % m)
# Correct: (a / b) % m = (a % m) * inverse(b, m) % m
# Example: (10 / 2) % 7 = 5
# Wrong: (10 % 7) / (2 % 7) = 3 / 2 = 1.5 ??
# Correct: (10 % 7) * mod_inverse(2, 7) % 7 = 3 * 4 % 7 = 5
# (since 2 * 4 = 8 ≡ 1 (mod 7), so 2⁻¹ ≡ 4 (mod 7))
Pitfall 3: Fast power with base=0 or mod=1
def power_mod_safe(base, exp, mod):
if mod == 1:
return 0 # anything mod 1 = 0
if base == 0:
return 0 if exp > 0 else 1 # 0^0 usually defined as 1
# ... normal fast power
Pitfall 4: C(n, k) when k > n
# Must return 0 when k > n; don't let factorial compute negative values
def safe_comb(n, k, mod):
if k < 0 or k > n:
return 0
# ... normal computation
4.6 Complete Number Theory Toolkit for Interviews
class NumberTheoryToolbox:
"""Complete number theory toolkit for contests/interviews"""
def __init__(self, mod: int = 10**9 + 7):
self.mod = mod
def power(self, base: int, exp: int) -> int:
result = 1
base %= self.mod
while exp > 0:
if exp & 1:
result = result * base % self.mod
base = base * base % self.mod
exp >>= 1
return result
def inverse(self, a: int) -> int:
return self.power(a, self.mod - 2)
def factorial_and_inverse(self, n: int) -> tuple:
fact = [1] * (n + 1)
for i in range(1, n + 1):
fact[i] = fact[i - 1] * i % self.mod
inv_fact = [1] * (n + 1)
inv_fact[n] = self.power(fact[n], self.mod - 2)
for i in range(n - 1, -1, -1):
inv_fact[i] = inv_fact[i + 1] * (i + 1) % self.mod
return fact, inv_fact
def comb(self, n: int, k: int, fact: list, inv_fact: list) -> int:
if k < 0 or k > n:
return 0
return fact[n] * inv_fact[k] % self.mod * inv_fact[n - k] % self.mod
@staticmethod
def gcd(a: int, b: int) -> int:
while b:
a, b = b, a % b
return a
@staticmethod
def lcm(a: int, b: int) -> int:
return a // NumberTheoryToolbox.gcd(a, b) * b
@staticmethod
def extended_gcd(a: int, b: int) -> tuple:
if b == 0:
return a, 1, 0
g, x, y = NumberTheoryToolbox.extended_gcd(b, a % b)
return g, y, x - (a // b) * y
@staticmethod
def crt(remainders: list, moduli: list) -> int:
M = 1
for m in moduli:
M *= m
result = 0
for a, m in zip(remainders, moduli):
Mi = M // m
_, t, _ = NumberTheoryToolbox.extended_gcd(Mi, m)
result += a * Mi * t
return result % M
4.7 Interview Response Strategy
When encountering number theory problems in interviews:
-
Confirm bounds first: The range of n determines algorithm choice
- n ≤ 10⁶: Sieve / brute force
- n ≤ 10⁷: Linear sieve
- n ≤ 10^18: Fast power / Miller-Rabin / Pollard Rho
-
Three principles of modular arithmetic:
- Addition, subtraction, multiplication can take mod anytime
- Division must be converted to multiplication by inverse
- Take mod at every step to prevent overflow
-
Common pattern recognition:
- "Result modulo 10^9+7" → Need modular inverse, fast power
- "Compute binomial coefficient" → Preprocess factorials + Lucas
- "Greatest common divisor" → Euclid's algorithm
- "Congruence equations" → Extended Euclidean / CRT
-
Complexity quick reference:
- gcd: O(log n)
- Fast power: O(log n)
- Sieve of Eratosthenes [1,n]: O(n log log n)
- Linear sieve [1,n]: O(n)
- Trial division: O(√n)
- Miller-Rabin: O(k · log²n), k = number of witnesses