Chapter 13

Tries and Aho-Corasick Automaton

Chapter 13: Tries and Aho-Corasick Automaton

In the domain of text processing, one class of problems recurs perpetually: given a set of keywords, how do we efficiently locate them within a body of text? Search engine autocomplete, IDE code suggestions, network firewall content filtering, DNA sequence pattern matching—all these scenarios rely on the same family of data structures: the Trie (prefix tree).

When we further combine the Trie with the failure-link concept from the KMP algorithm, we obtain the Aho-Corasick automaton (AC automaton)—an algorithm capable of simultaneously matching multiple pattern strings in O(n + m + z) time, where n is the text length, m is the total length of all patterns, and z is the number of matches. This algorithm was proposed by Alfred Aho and Margaret Corasick at Bell Labs in 1975 and remains the industrial standard for multi-pattern matching to this day.

This chapter begins with the most fundamental Trie operations, progressively deepening into the complete implementation of the AC automaton, and finally grounding the theory with real interview problems and engineering case studies.

Level 1 · What You Need to Know

13.1 The Basic Concept of Trie

Trie (pronounced "try", derived from retrieval) is a multi-way tree structure where each path from root to a marked node represents a string. Unlike hash tables, Tries natively support prefix queries—this is their core advantage.

Imagine a Trie storing ["apple", "app", "apt", "bat", "bad"]:

        root
       /    \
      a      b
     /        \
    p          a
   / \        / \
  p   t      t   d
  |
  l
  |
  e

Each node represents a character (strictly speaking, a character on an edge), and the path from the root to any marked node forms a complete word.

Why not just use a hash table? Hash table lookup for a single key is O(L) (L being the string length), but it cannot answer questions like "what are all the words starting with 'ap'?" A hash table would need to iterate over all keys and check each prefix, yielding O(N·L) complexity; a Trie only needs to traverse to the 'a' → 'p' node and then explore its subtree, with complexity O(P + K) where P is the prefix length and K is the number of matching results.

13.2 Basic Trie Implementation

class TrieNode:
    def __init__(self):
        self.children = {}  # char -> TrieNode
        self.is_end = False  # marks whether this is the end of a complete word


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        """Insert a word. Time complexity: O(L), L = len(word)"""
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

    def search(self, word: str) -> bool:
        """Exact lookup. Time complexity: O(L)"""
        node = self._find_node(word)
        return node is not None and node.is_end

    def starts_with(self, prefix: str) -> bool:
        """Prefix query. Time complexity: O(P), P = len(prefix)"""
        return self._find_node(prefix) is not None

    def _find_node(self, prefix: str):
        """Find the node corresponding to a prefix"""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return None
            node = node.children[char]
        return node

Usage example:

trie = Trie()
trie.insert("apple")
trie.insert("app")
trie.insert("application")

print(trie.search("app"))        # True
print(trie.search("ap"))         # False (not a complete word)
print(trie.starts_with("ap"))    # True
print(trie.starts_with("b"))     # False

Complexity analysis:

Operation	Time Complexity	Space Complexity
Insert	O(L)	O(L) (worst case, no shared prefix)
Search	O(L)	O(1)
Prefix query	O(P)	O(1)
Build entire tree	O(N·L_avg)	O(N·L_avg) (worst case)

Where L is word length, P is prefix length, N is number of words, and L_avg is average word length.

13.3 Autocomplete Implementation

Autocomplete is the most classic application of Tries. When a user types a prefix, we need to return all words starting with that prefix:

class AutoCompleteTrie(Trie):
    def autocomplete(self, prefix: str, limit: int = 10) -> list:
        """Return all words starting with prefix, up to limit results"""
        node = self._find_node(prefix)
        if node is None:
            return []

        results = []
        self._dfs_collect(node, prefix, results, limit)
        return results

    def _dfs_collect(self, node, current_word, results, limit):
        """DFS to collect all complete words"""
        if len(results) >= limit:
            return
        if node.is_end:
            results.append(current_word)
        for char in sorted(node.children.keys()):  # lexicographic order
            self._dfs_collect(
                node.children[char],
                current_word + char,
                results,
                limit
            )

Weighted autocomplete (ranked by popularity):

class WeightedTrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.weight = 0  # search frequency/popularity


class WeightedAutoComplete:
    def __init__(self):
        self.root = WeightedTrieNode()

    def insert(self, word: str, weight: int) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = WeightedTrieNode()
            node = node.children[char]
        node.is_end = True
        node.weight = weight

    def top_k_suggestions(self, prefix: str, k: int = 5) -> list:
        """Return top-k suggestions by weight"""
        import heapq
        node = self._find_node(prefix)
        if node is None:
            return []

        # Use a min-heap to maintain top-k
        heap = []  # (weight, word)
        self._collect_all(node, prefix, heap, k)
        # Return in descending weight order
        return [word for _, word in sorted(heap, reverse=True)]

    def _collect_all(self, node, current, heap, k):
        import heapq
        if node.is_end:
            if len(heap) < k:
                heapq.heappush(heap, (node.weight, current))
            elif node.weight > heap[0][0]:
                heapq.heapreplace(heap, (node.weight, current))
        for char, child in node.children.items():
            self._collect_all(child, current + char, heap, k)

    def _find_node(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return None
            node = node.children[char]
        return node

13.4 Compressed Trie (Patricia Tree / Radix Tree)

A problem with standard Tries is that when a path has only single-child nodes, substantial space is wasted. For example, storing "romane", "romanus", "romulus"—the path from 'r' to 'rom' has nodes each with only one child.

Patricia Tree (Practical Algorithm to Retrieve Information Coded in Alphanumeric, proposed by Donald R. Morrison in 1968) compresses these single-chain paths into a single edge:

Standard Trie:       Compressed Trie (Patricia Tree):
    r                     rom
    |                    /   \
    o                  an     ulus
    |                 / \
    m               e    us
   / \
  a   u
  |   |
  n   l
 / \  |
e   u u
    |  s
    s

class PatriciaNode:
    def __init__(self, label=""):
        self.label = label       # edge label (can be multiple characters)
        self.children = {}       # first character -> PatriciaNode
        self.is_end = False


class PatriciaTrie:
    def __init__(self):
        self.root = PatriciaNode()

    def insert(self, word: str) -> None:
        node = self.root
        i = 0
        while i < len(word):
            char = word[i]
            if char not in node.children:
                # Insert remaining portion as a new edge
                new_node = PatriciaNode(word[i:])
                new_node.is_end = True
                node.children[char] = new_node
                return

            child = node.children[char]
            label = child.label
            # Find longest common prefix between word[i:] and label
            j = 0
            while j < len(label) and i + j < len(word) and label[j] == word[i + j]:
                j += 1

            if j == len(label):
                # Label fully matched, continue downward
                i += j
                node = child
            else:
                # Need to split the node
                split_node = PatriciaNode(label[:j])
                node.children[char] = split_node

                # Original child becomes child of split_node
                child.label = label[j:]
                split_node.children[label[j]] = child

                # If word has remaining characters, create new branch
                if i + j < len(word):
                    new_node = PatriciaNode(word[i + j:])
                    new_node.is_end = True
                    split_node.children[word[i + j]] = new_node
                else:
                    split_node.is_end = True
                return

        node.is_end = True

    def search(self, word: str) -> bool:
        node = self.root
        i = 0
        while i < len(word):
            char = word[i]
            if char not in node.children:
                return False
            child = node.children[char]
            label = child.label
            if not word[i:i+len(label)] == label:
                return False
            i += len(label)
            node = child
        return node.is_end

Advantages of Patricia Tree:

Space: reduced from O(N·L) to O(N), since only non-shared suffixes need storage
Search speed: actually faster due to fewer pointer dereferences
Ideal for long strings: URL routing, file path matching, etc.

13.5 Common Mistakes and Pitfalls

Mistake 1: Confusing search and starts_with

# Wrong: forgetting to check the is_end flag
def search(self, word):
    node = self._find_node(word)
    return node is not None  # BUG! "app" would return True in "apple"

# Correct:
def search(self, word):
    node = self._find_node(word)
    return node is not None and node.is_end

Mistake 2: Failing to reclaim empty nodes during deletion

# Correct deletion implementation
def delete(self, word: str) -> bool:
    """Delete a word, return whether deletion was successful"""
    def _delete(node, word, depth):
        if depth == len(word):
            if not node.is_end:
                return False  # Word does not exist
            node.is_end = False
            return len(node.children) == 0  # Can this node be removed?

        char = word[depth]
        if char not in node.children:
            return False

        should_delete = _delete(node.children[char], word, depth + 1)
        if should_delete:
            del node.children[char]
            return not node.is_end and len(node.children) == 0
        return False

    _delete(self.root, word, 0)

Mistake 3: Autocomplete without a limit causing OOM

When a Trie stores millions of words, a very short prefix (like "a") may match hundreds of thousands of results. A limit parameter is essential.

Level 2 · How It Works Under the Hood

13.6 Trie Space Optimization Strategies

13.6.1 Comparing Child Node Storage Approaches

The children field of a Trie node can be implemented in several ways, and the choice directly impacts time and space performance:

Approach 1: Fixed-size array

class TrieNodeArray:
    def __init__(self):
        # 26 letters, direct index access
        self.children = [None] * 26
        self.is_end = False

    def get_child(self, char):
        return self.children[ord(char) - ord('a')]

    def set_child(self, char, node):
        self.children[ord(char) - ord('a')] = node

Time: O(1) child access
Space: each node has a fixed 26-pointer slot (even if most are empty)
Best for: small character sets (e.g., lowercase letters only), maximum speed

Approach 2: Hash table (dictionary)

class TrieNodeHash:
    def __init__(self):
        self.children = {}  # dynamically allocated
        self.is_end = False

Time: amortized O(1), but with hashing overhead
Space: only stores actually existing children
Best for: large character sets (Unicode), sparse Tries

Approach 3: Sorted array (for static Tries)

class TrieNodeSorted:
    def __init__(self):
        self.keys = []    # sorted character list
        self.values = []  # corresponding child nodes
        self.is_end = False

    def get_child(self, char):
        # Binary search
        import bisect
        idx = bisect.bisect_left(self.keys, char)
        if idx < len(self.keys) and self.keys[idx] == char:
            return self.values[idx]
        return None

Time: O(log k) lookup, where k is the number of children
Space: compact, no waste
Best for: static dictionaries that are rarely modified after construction

Quantitative comparison (storing 100,000 English words, average length 8):

Implementation	Memory Usage	Insert Time	Search Time
Fixed array (26)	~100 MB	Fastest	Fastest
Hash table	~40 MB	Medium	Medium
Sorted array	~25 MB	Slowest (element shifting)	Slightly slower (binary search)
Compressed Trie	~15 MB	Medium	Fast

13.6.2 Double-Array Trie

In scenarios requiring extreme performance such as Chinese word segmentation and Japanese input methods, the Double-Array Trie (Aoe Jun-ichi, 1989) is the industrial-grade choice. It uses two integer arrays base[] and check[] to simulate the entire Trie:

class DoubleArrayTrie:
    """
    Core idea of Double-Array Trie:
    - base[s] + c = t: transition from state s via character c to state t
    - check[t] = s: verify that state t's parent is indeed s
    """
    def __init__(self, size=1000000):
        self.base = [0] * size
        self.check = [-1] * size  # -1 means unused
        self.base[0] = 1  # root node

    def transition(self, state, char_code):
        """Transition from state via char_code"""
        t = self.base[state] + char_code
        if t < len(self.check) and self.check[t] == state:
            return t
        return -1  # transition failed

    def search(self, word):
        state = 0
        for ch in word:
            code = ord(ch) - ord('a') + 1  # 1-based
            state = self.transition(state, code)
            if state == -1:
                return False
        # Check for terminal state (using special marker)
        end_state = self.transition(state, 0)  # 0 as terminator
        return end_state != -1

Advantages of Double-Array Trie:

Extremely compact memory: only two integer arrays
Cache-friendly: contiguous memory access
Search speed: only array index arithmetic, no pointer chasing

Drawbacks include complex construction and difficult dynamic insertion—best suited for static dictionaries (such as word segmentation dictionaries).

13.7 Trie in IP Routing Tables

One of the core tasks of a network router is Longest Prefix Match (LPM): given a destination IP address, find the longest matching prefix in the routing table. This is a natural application for Tries.

class IPRoutingTrie:
    """
    Trie implementation for IP routing tables.
    Each bit serves as a character (0 or 1), building a binary Trie.
    """
    class Node:
        def __init__(self):
            self.children = [None, None]  # 0 and 1
            self.next_hop = None  # routing next hop

    def __init__(self):
        self.root = self.Node()

    def insert_route(self, prefix: str, prefix_len: int, next_hop: str):
        """
        Insert a routing entry.
        prefix: binary representation of IP address
        prefix_len: prefix length (the /24 in CIDR notation)
        next_hop: next hop address
        """
        node = self.root
        for i in range(prefix_len):
            bit = int(prefix[i])
            if node.children[bit] is None:
                node.children[bit] = self.Node()
            node = node.children[bit]
        node.next_hop = next_hop

    def longest_prefix_match(self, ip_binary: str) -> str:
        """
        Longest prefix match.
        Returns the next hop of the most specific matching route.
        """
        node = self.root
        last_match = None
        for bit_char in ip_binary:
            bit = int(bit_char)
            if node.children[bit] is None:
                break
            node = node.children[bit]
            if node.next_hop is not None:
                last_match = node.next_hop
        return last_match

    @staticmethod
    def ip_to_binary(ip: str) -> str:
        """Convert IP address to 32-bit binary string"""
        parts = ip.split('.')
        binary = ''
        for part in parts:
            binary += format(int(part), '08b')
        return binary


# Usage example
router = IPRoutingTrie()

# Insert routing table entries
# 192.168.0.0/16 -> Gateway A
router.insert_route(
    IPRoutingTrie.ip_to_binary("192.168.0.0"), 16, "Gateway A"
)
# 192.168.1.0/24 -> Gateway B
router.insert_route(
    IPRoutingTrie.ip_to_binary("192.168.1.0"), 24, "Gateway B"
)
# 0.0.0.0/0 -> Default Gateway
router.insert_route(
    IPRoutingTrie.ip_to_binary("0.0.0.0"), 0, "Default"
)

# Query
ip = IPRoutingTrie.ip_to_binary("192.168.1.100")
print(router.longest_prefix_match(ip))  # "Gateway B" (matches /24, longer)

ip2 = IPRoutingTrie.ip_to_binary("192.168.2.50")
print(router.longest_prefix_match(ip2))  # "Gateway A" (only matches /16)

Why do routers use Tries instead of hash tables? Because route lookup is fundamentally a prefix matching problem. A packet's destination IP may match multiple routing rules (e.g., /8, /16, /24), and the router needs to find the longest one. Hash tables can only perform exact matches and cannot efficiently handle prefix relationships.

Optimizations in modern routers:

Multi-bit Trie: examine 4 or 8 bits at a time, reducing tree depth (but increasing node size)
Level-compressed Trie: compress consecutive single-branch paths
Hardware TCAM: specialized hardware for parallel matching of all rules, O(1) but expensive

13.8 AC Automaton: The Ultimate Weapon for Multi-Pattern Matching

13.8.1 Problem Definition

Single-pattern matching: find one pattern P in text T (solved by KMP). Multi-pattern matching: simultaneously find multiple patterns P1, P2, ..., Pk in text T.

Naive approach: run KMP separately for each pattern, total complexity O(n·k + m). When k is large (e.g., a sensitive word dictionary with tens of thousands of entries), this is unacceptable.

The AC automaton's core insight is: build all patterns into a Trie, then add failure links on the Trie, so that scanning the text never requires backtracking—this is identical in spirit to KMP, except generalized from one dimension (single pattern) to a tree structure (multiple patterns).

13.8.2 The Three Components of an AC Automaton

Goto function: the Trie transitions—from current state via a character to the next state
Failure function: when Goto fails, jump to the state corresponding to the longest proper suffix (analogous to KMP's next/failure array)
Output function: records all matching patterns at each state

13.8.3 Complete Implementation

from collections import deque


class AhoCorasick:
    """AC Automaton - Multi-pattern string matching"""

    class State:
        def __init__(self):
            self.goto = {}        # transition function: char -> state_id
            self.failure = 0      # failure link
            self.output = []      # output function: patterns matched at this state

    def __init__(self):
        self.states = [self.State()]  # initially only root (id=0)

    def _new_state(self):
        self.states.append(self.State())
        return len(self.states) - 1

    def build(self, patterns: list) -> None:
        """
        Three-step construction of AC automaton:
        1. Build Trie
        2. BFS to establish failure pointers
        3. Merge outputs
        """
        # Step 1: Build Goto (Trie)
        for pattern in patterns:
            state = 0
            for char in pattern:
                if char not in self.states[state].goto:
                    self.states[state].goto[char] = self._new_state()
                state = self.states[state].goto[char]
            self.states[state].output.append(pattern)

        # Step 2 & 3: BFS to build Failure pointers + merge Output
        queue = deque()

        # Depth-1 nodes: failure points to root
        for char, s in self.states[0].goto.items():
            self.states[s].failure = 0
            queue.append(s)

        while queue:
            u = queue.popleft()
            for char, v in self.states[u].goto.items():
                queue.append(v)

                # Compute failure pointer for v
                f = self.states[u].failure
                while f != 0 and char not in self.states[f].goto:
                    f = self.states[f].failure

                self.states[v].failure = (
                    self.states[f].goto[char] if char in self.states[f].goto else 0
                )

                # Avoid failure pointing to self
                if self.states[v].failure == v:
                    self.states[v].failure = 0

                # Merge output (all patterns on the suffix link chain)
                self.states[v].output = (
                    self.states[v].output +
                    self.states[self.states[v].failure].output
                )

    def search(self, text: str) -> list:
        """
        Scan text, return all matches [(start_pos, pattern), ...]
        Time complexity: O(n + z), n = len(text), z = total matches
        """
        results = []
        state = 0

        for i, char in enumerate(text):
            while state != 0 and char not in self.states[state].goto:
                state = self.states[state].failure
            state = self.states[state].goto.get(char, 0)

            for pattern in self.states[state].output:
                results.append((i - len(pattern) + 1, pattern))

        return results


# Usage example
ac = AhoCorasick()
patterns = ["he", "she", "his", "hers"]
ac.build(patterns)

text = "ahishers"
matches = ac.search(text)
for pos, pattern in matches:
    print(f"Position {pos}: '{pattern}'")
# Output:
# Position 1: 'his'
# Position 3: 'she'
# Position 3: 'he' (she contains he)
# Position 4: 'hers'

13.8.4 Visualizing AC Automaton Execution

Using the pattern set {"he", "she", "his", "hers"} as an example, the automaton's state graph (simplified):

State transitions (Goto):
  0 --h--> 1 --e--> 2 [output: "he"]
                     \--r--> 8 --s--> 9 [output: "hers"]
           \--i--> 6 --s--> 7 [output: "his"]
  0 --s--> 3 --h--> 4 --e--> 5 [output: "she", "he"]

Failure pointers:
  State 5 (she) --failure--> State 2 (he)
  Because the longest proper suffix of "she" that is also a Trie prefix is "he"

When scanning text "ahishers":

'a': root has no 'a' transition, stay at root (state 0)
'h': transition to state 1
'i': transition to state 6
's': transition to state 7, output "his"
'h': state 7 has no 'h' transition, follow failure chain back to state 0, then transition to state 1... then to state 4 (since "sh" path exists)
'e': transition to state 5, output "she" and "he"
'r': from state 5's failure state 2, transition to state 8
's': transition to state 9, output "hers"

13.9 Sensitive Word Filtering in Practice

class SensitiveWordFilter:
    """
    Sensitive word filter based on AC automaton.
    Supports:
    - Dynamic add/remove of sensitive words
    - Full-width/half-width character normalization
    - Case-insensitive matching
    - Replacing matches with ***
    """
    def __init__(self):
        self.ac = None
        self.patterns = set()
        self._dirty = True  # flag indicating rebuild needed

    def add_word(self, word: str) -> None:
        """Add a sensitive word"""
        normalized = self._normalize(word)
        if normalized:
            self.patterns.add(normalized)
            self._dirty = True

    def remove_word(self, word: str) -> None:
        """Remove a sensitive word"""
        normalized = self._normalize(word)
        self.patterns.discard(normalized)
        self._dirty = True

    def _normalize(self, text: str) -> str:
        """Normalize text: full-width to half-width, to lowercase"""
        result = []
        for ch in text:
            code = ord(ch)
            # Full-width to half-width
            if 0xFF01 <= code <= 0xFF5E:
                ch = chr(code - 0xFEE0)
            elif code == 0x3000:
                ch = ' '
            result.append(ch.lower())
        return ''.join(result)

    def _ensure_built(self):
        """Ensure automaton is built"""
        if self._dirty:
            self.ac = AhoCorasick()
            self.ac.build(list(self.patterns))
            self._dirty = False

    def contains(self, text: str) -> bool:
        """Check whether text contains any sensitive word"""
        self._ensure_built()
        normalized = self._normalize(text)
        return len(self.ac.search(normalized)) > 0

    def filter(self, text: str, replacement: str = "***") -> str:
        """Replace sensitive words with specified characters"""
        self._ensure_built()
        normalized = self._normalize(text)
        matches = self.ac.search(normalized)

        if not matches:
            return text

        # Mark positions that need replacement
        mask = [False] * len(text)
        for start, pattern in matches:
            for i in range(start, start + len(pattern)):
                mask[i] = True

        # Build result
        result = []
        i = 0
        while i < len(text):
            if mask[i]:
                result.append(replacement)
                while i < len(text) and mask[i]:
                    i += 1
            else:
                result.append(text[i])
                i += 1
        return ''.join(result)

    def find_all(self, text: str) -> list:
        """Find all sensitive words and their positions"""
        self._ensure_built()
        normalized = self._normalize(text)
        return self.ac.search(normalized)


# Usage example
word_filter = SensitiveWordFilter()
word_filter.add_word("gambling")
word_filter.add_word("drugs")
word_filter.add_word("violence")

text = "This website contains gambling and drugs content"
print(word_filter.contains(text))  # True
print(word_filter.filter(text))    # "This website contains *** and *** content"
print(word_filter.find_all(text))  # [(26, 'gambling'), (38, 'drugs')]

Engineering optimization recommendations:

Cache-friendly state storage: store all states in a contiguous array rather than a linked list of objects
Incremental building: for frequently changing word dictionaries, maintain multiple small AC automatons for batch querying
Skip irrelevant characters: sensitive words may have special characters inserted between them (e.g., "gamblin*g"), preprocess to remove interfering characters
Multi-level filtering: first use a Bloom filter to quickly exclude texts without sensitive words, then use the AC automaton for precise matching

Level 3 · What the Specification Defines

13.10 Edward Fredkin and the Origin of Trie (1960)

The origin of the name "Trie" is itself an interesting naming dispute in computer science history.

In 1959, Rene de la Briandais presented a paper at the Western Joint Computer Conference titled "File Searching Using Variable Length Keys," describing a tree structure that organizes a dictionary through character-by-character comparison of strings. However, he did not give this structure a dedicated name.

In 1960, Edward Fredkin formally proposed the term "Trie" in his paper "Trie Memory" (Communications of the ACM, Vol. 3, No. 9). Fredkin explained that the word was taken from the middle of "retrieval," implying its core purpose of information retrieval.

The naming controversy: Fredkin himself pronounced "Trie" as "tree" (homophonous with the tree data structure), arguing it emphasizes that it is fundamentally a tree structure. However, most of academia and industry later adopted the pronunciation "try" to avoid confusion with the generic "tree." Donald Knuth discussed this in detail in "The Art of Computer Programming" Vol. 3 (Sorting and Searching, 1973) and favored the "try" pronunciation.

Fredkin's original motivation: while working at Bolt, Beranek and Newman (BBN, which later participated in building ARPANET), he needed to efficiently store and retrieve large numbers of strings. Hash tables, while fast for individual lookups, could not support prefix range queries—a fundamental requirement in information retrieval systems. The Trie structure directly solved this problem.

Fredkin's other contribution: he simultaneously proposed the idea that Tries could be generalized to any radix. When each character has r possible values, each node has at most r children. For binary strings, r=2, which became the binary Trie widely used in IP routing.

13.11 The Birth of the Aho-Corasick Automaton (1975)

Alfred V. Aho and Margaret J. Corasick published their landmark paper "Efficient String Matching: An Aid to Bibliographic Search" in Communications of the ACM (Vol. 18, No. 6) in 1975.

Research context: Bell Labs was developing text processing tools at the time. The Unix fgrep command needed to simultaneously search for multiple fixed strings in a file. If each pattern was searched independently, performance became unacceptable when the number of patterns reached hundreds. Aho and Corasick needed an algorithm that could "scan the text once, matching all patterns simultaneously."

Core innovations:

Build all patterns into a Trie (which they called the "goto function")
Add a "failure function" on the Trie—when the current character cannot continue matching in the Trie, jump to the state corresponding to the longest correct suffix. This is the same concept as KMP, but generalized to the multi-pattern case
Add an "output function"—through failure chain propagation, ensure that passing through a state reports all patterns ending at that position

Complexity proof (key theorem from the paper):

Let n be the text length, m be the total length of all patterns, and z be the number of matches:

Preprocessing (building the automaton): O(m * sigma), where sigma is the alphabet size. With hash table transitions: O(m)
Search: O(n + z)
Space: O(m * sigma) (array implementation) or O(m) (hash table implementation)

Why is search O(n + z)? The key observation is: each character in the text is processed at most once (the total number of failure chain retreats throughout the entire search is O(n), identical to KMP's analysis).

13.12 The Relationship Between AC Automaton and KMP

The KMP algorithm (Knuth-Morris-Pratt, 1977) and the AC automaton share a deep internal connection. Formally:

KMP is a special case of the AC automaton for a single pattern.

Concept	KMP	AC Automaton
Structural basis	Single pattern (linear)	Multi-pattern Trie (tree-shaped)
Failure handling	next[j]: longest proper prefix-suffix of first j chars	failure[s]: longest proper suffix of state s's string (in the Trie)
Preprocessing	O(m)	O(m * sigma) or O(m)
Search	O(n)	O(n + z)
Publication	Knuth, Morris, Pratt 1977 (actually discovered in 1970)	Aho, Corasick 1975

An interesting historical detail: the AC paper (1975) was published earlier than the KMP paper (1977). In reality, the KMP algorithm was discovered in 1970, but the paper underwent lengthy revisions before formal publication. Aho explicitly drew on KMP's failure-jump concept when designing the AC automaton (he was Knuth's colleague and knew of KMP's unpublished work).

The generalization from KMP to AC:

# KMP's next array: for a single pattern
def build_kmp_next(pattern):
    """
    next[i] = length of longest proper prefix-suffix of pattern[0:i]
    Equivalent to: if mismatch at position i, jump to next[i] to continue
    """
    n = len(pattern)
    next_arr = [0] * n
    j = 0
    for i in range(1, n):
        while j > 0 and pattern[i] != pattern[j]:
            j = next_arr[j - 1]
        if pattern[i] == pattern[j]:
            j += 1
        next_arr[i] = j
    return next_arr

# AC's failure pointer: generalized to multi-pattern (tree structure)
# Comparison: KMP's j = next[j-1] corresponds to AC's failure = states[failure].failure
# KMP is backtracking on a chain; AC is backtracking on a tree (but the logic is identical)

Essential unity: Both KMP and AC automaton are special cases of Deterministic Finite Automata (DFA). KMP builds a linear DFA (each state corresponds to a prefix of the pattern), while the AC automaton builds a tree-shaped DFA (each state corresponds to a prefix in the Trie). Both use the failure function to implement DFA state compression—avoiding explicit storage of all possible transitions.

13.13 Introduction to Suffix Trees (Weiner, 1973)

The Suffix Tree is another extremely important member of the Trie family. Peter Weiner presented his paper "Linear Pattern Matching Algorithms" at FOCS (IEEE Symposium on Foundations of Computer Science) in 1973, proposing the first linear-time suffix tree construction algorithm. Knuth called it "the Algorithm of the Year 1973."

What is a suffix tree? For the string S = "banana$" ($ is a terminator), all its suffixes are:

banana$
anana$
nana$
ana$
na$
a$
$

Inserting all these suffixes into a compressed Trie (Patricia Tree) yields the suffix tree.

The power of suffix trees:

Problem	Complexity
Determine if P is a substring of S	O(
Count occurrences of P in S	O(
Find longest repeated substring of S	O(
Find longest common substring of two strings	O(
Find k-th smallest suffix (suffix array)	O(

Evolution of construction algorithms:

Weiner (1973): first O(n) algorithm, builds right to left, high space overhead
McCreight (1976): builds left to right, space optimized
Ukkonen (1995): online algorithm, incremental character-by-character construction, simplest implementation

Relationship to Trie: a suffix tree is essentially a compressed Trie built from all suffixes of a single string. It extends Trie applications from "a collection of short strings" to "substring problems of a single long string."

Why suffix trees have been largely replaced by suffix arrays in practice: The Suffix Array proposed by Manber and Myers (1993) requires only an integer array to store the sorted order of suffixes. Combined with an LCP array, it can solve most problems that suffix trees handle, with far less space overhead (suffix trees require multiple pointers per node). In fields like genomics that process extremely long strings, suffix arrays are the de facto standard.

13.14 Theoretical Complexity Analysis of Tries

Average-case analysis (assuming strings are generated uniformly at random over alphabet Sigma):

Philippe Flajolet and Robert Sedgewick performed precise average-case analysis of Tries in "Analytic Combinatorics" (2009):

A Trie built from n random strings has approximately n / ln(sigma) nodes on average, where sigma = |Sigma|
Average height is approximately (2 ln n) / ln sigma
The average number of comparisons to search for a random string is approximately log_sigma(n)

Worst case: when all strings share a very long common prefix (e.g., "aaa...a1", "aaa...a2"), the Trie degenerates into a nearly linear chain, with height approaching the length of the longest string L_max. This is precisely the raison d'etre of compressed Tries.

Level 4 · Edge Cases and Pitfalls

13.15 LeetCode #208: Implement Trie (Prefix Tree)

Problem: Implement a Trie class supporting insert, search, and startsWith operations.

This is the most fundamental Trie interview problem, directly testing data structure implementation skills:

class Trie:
    def __init__(self):
        self.children = {}
        self.is_end = False

    def insert(self, word: str) -> None:
        node = self
        for ch in word:
            if ch not in node.children:
                node.children[ch] = Trie()
            node = node.children[ch]
        node.is_end = True

    def search(self, word: str) -> bool:
        node = self._search_prefix(word)
        return node is not None and node.is_end

    def startsWith(self, prefix: str) -> bool:
        return self._search_prefix(prefix) is not None

    def _search_prefix(self, prefix: str):
        node = self
        for ch in prefix:
            if ch not in node.children:
                return None
            node = node.children[ch]
        return node

Interview key points:

Explain why you use dict vs fixed array (generality vs performance tradeoff)
Explain the necessity of is_end (distinguishing prefixes from complete words)
Discuss thread safety: multi-threaded environments need locks or ConcurrentHashMap
Clear time/space complexity analysis

13.16 LeetCode #211: Design Add and Search Words Data Structure

Problem: Design a WordDictionary that supports addWord and search, where '.' in search can match any single character.

class WordDictionary:
    def __init__(self):
        self.children = {}
        self.is_end = False

    def addWord(self, word: str) -> None:
        node = self
        for ch in word:
            if ch not in node.children:
                node.children[ch] = WordDictionary()
            node = node.children[ch]
        node.is_end = True

    def search(self, word: str) -> bool:
        return self._search(word, 0)

    def _search(self, word: str, index: int) -> bool:
        if index == len(word):
            return self.is_end

        ch = word[index]
        if ch == '.':
            # Wildcard: try all children
            for child in self.children.values():
                if child._search(word, index + 1):
                    return True
            return False
        else:
            if ch not in self.children:
                return False
            return self.children[ch]._search(word, index + 1)

Complexity analysis:

addWord: O(L)
search (no wildcards): O(L)
search (all wildcards "..."): O(26^L), exponential worst case

Interview follow-ups:

"How to optimize if there are many wildcards?"—Group words by length, filter mismatched lengths first
"What if we need to support '*' matching any number of characters?"—Requires backtracking or dynamic programming
"Can we use a regex engine instead?"—For a fixed dictionary, Trie with preprocessing is faster for queries

13.17 LeetCode #212: Word Search II

Problem: Given an m x n board of characters and a list of words, find all words that exist both in the board and the list. Words on the board must be constructed from adjacent cells (up/down/left/right), and each cell may not be reused.

This is the classic combination of Trie + DFS backtracking:

class Solution:
    def findWords(self, board: list, words: list) -> list:
        # Step 1: Build all words into a Trie
        root = {}
        for word in words:
            node = root
            for ch in word:
                node = node.setdefault(ch, {})
            node['#'] = word  # '#' marks word end, stores complete word

        m, n = len(board), len(board[0])
        result = []

        def dfs(i, j, parent):
            ch = board[i][j]
            node = parent.get(ch)
            if node is None:
                return

            # Check if we've matched a complete word
            if '#' in node:
                result.append(node['#'])
                del node['#']  # Avoid duplicate additions

            # Mark as visited
            board[i][j] = '@'

            # DFS in four directions
            for di, dj in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
                ni, nj = i + di, j + dj
                if 0 <= ni < m and 0 <= nj < n and board[ni][nj] != '@':
                    dfs(ni, nj, node)

            # Restore
            board[i][j] = ch

            # Optimization: prune node if it has no more children
            if not node:
                del parent[ch]

        # Start search from every position
        for i in range(m):
            for j in range(n):
                dfs(i, j, root)

        return result

Key optimizations:

Use Trie instead of per-word search: naive approach does one DFS per word, complexity O(kmn*4^L). With Trie, all words sharing a prefix only need one search
Pruning: remove a word from Trie after finding it, reducing subsequent search space
Node cleanup: when all descendants of a Trie node have been matched, delete the node so future searches won't enter this path

Complexity:

Time: O(mn4*3^(L-1)), where L is the longest word length. First step has 4 directions, subsequent steps only 3 (cannot go back)
Space: O(k*L) (Trie) + O(L) (recursion stack)

13.18 Trie vs Hash Table: How to Choose

This is a common design discussion question in interviews. The answer depends on specific requirements:

Dimension	Trie	Hash Table
Exact lookup	O(L)	O(L) (hash computation is also O(L))
Prefix query	O(P + K), native support	Not supported (must iterate all keys)
Sorted traversal	Naturally ordered (DFS = lexicographic)	Unordered
Longest prefix match	O(L), native support	Requires multiple lookups
Memory usage	Usually larger (one node per character)	Usually smaller
Cache performance	Poor (pointer chasing, random access)	Good (contiguous memory)
Dynamic insert/delete	O(L)	Amortized O(L)
Hash collisions	Don't exist	Exist, worst case O(n)
Hotspot issues	Don't exist	Hot key rehashing

Choose Trie when:

Prefix matching needed (autocomplete, IP routing)
Lexicographic traversal needed
Longest common prefix needed
Strings share many common prefixes (URLs, file paths)

Choose Hash Table when:

Only exact lookup needed
Memory is limited
Strings share no common prefixes (UUIDs)
High cache hit rate needed

13.19 Classic Interview Variants

Variant 1: MapSum — Key-Value Mapping

class MapSum:
    """
    LeetCode #677: Implement insert(key, val) and sum(prefix).
    sum returns the sum of vals for all keys starting with prefix.
    """
    def __init__(self):
        self.root = {}
        self.map = {}  # track inserted key -> val

    def insert(self, key: str, val: int) -> None:
        delta = val - self.map.get(key, 0)
        self.map[key] = val
        node = self.root
        for ch in key:
            node = node.setdefault(ch, {'_sum': 0})
            node['_sum'] += delta

    def sum(self, prefix: str) -> int:
        node = self.root
        for ch in prefix:
            if ch not in node:
                return 0
            node = node[ch]
        return node.get('_sum', 0)

Variant 2: Palindrome Pairs

class Solution:
    """
    LeetCode #336: Given a list of unique words, find all pairs (i, j)
    such that words[i] + words[j] is a palindrome.
    
    Approach: Insert each word reversed into a Trie, check if remaining
    portion is a palindrome during search.
    """
    def palindromePairs(self, words: list) -> list:
        # Build reversed Trie
        root = {}
        for idx, word in enumerate(words):
            node = root
            reversed_word = word[::-1]
            for i, ch in enumerate(reversed_word):
                # If reversed_word[i:] is a palindrome, record this index
                if self._is_palindrome(reversed_word, i, len(reversed_word) - 1):
                    node.setdefault('_palindrome_indices', []).append(idx)
                node = node.setdefault(ch, {})
            node['_idx'] = idx
            node.setdefault('_palindrome_indices', []).append(idx)

        result = []
        for idx, word in enumerate(words):
            node = root
            for i, ch in enumerate(word):
                # Case 1: word is longer than some reversed word in Trie
                if '_idx' in node and node['_idx'] != idx:
                    if self._is_palindrome(word, i, len(word) - 1):
                        result.append([idx, node['_idx']])
                if ch not in node:
                    break
                node = node[ch]
            else:
                # Case 2: word fully matches or is shorter
                if '_idx' in node and node['_idx'] != idx:
                    result.append([idx, node['_idx']])
                # Case 3: longer reversed words in Trie with palindrome remainder
                for j in node.get('_palindrome_indices', []):
                    if j != idx and j != node.get('_idx', -1):
                        result.append([idx, j])

        return result

    def _is_palindrome(self, s, left, right):
        while left < right:
            if s[left] != s[right]:
                return False
            left += 1
            right -= 1
        return True

13.20 Engineering Considerations in Practice

1. Memory Management

When storing large numbers of strings, Trie's memory overhead can be 3-10x that of hash tables. In Python for example, each TrieNode object costs approximately 100-200 bytes (object header + dict overhead), while a string key in a hash table only needs the string's own memory.

Optimization strategies:

Use __slots__ to reduce object overhead
Use arrays instead of dicts for child storage (when charset is known and small)
Consider C extensions (e.g., marisa-trie, datrie libraries)

class TrieNodeSlots:
    """Using __slots__ to reduce memory overhead"""
    __slots__ = ['children', 'is_end']

    def __init__(self):
        self.children = {}
        self.is_end = False

2. Concurrency Safety

In production, Tries are often accessed by multiple threads (e.g., autocomplete services for search engines). Options:

Read-write locks: shared lock for reads, exclusive lock for writes
Copy-on-Write: create new version on write, atomically swap root pointer
Immutable Trie: no modification after construction, no locking needed

3. Persistence

Large Tries need disk persistence. Common approaches:

Serialize to compact binary format (e.g., marisa-trie's LOUDS encoding)
Use memory-mapped files (mmap) to avoid loading the entire tree into memory
FST (Finite State Transducer): used by Lucene/Elasticsearch, more compact than Trie

4. Performance Benchmarking

import time
import random
import string


def benchmark_trie_vs_hashset(n_words=100000, word_len=10):
    """Trie vs HashSet performance comparison"""
    # Generate random words
    words = [
        ''.join(random.choices(string.ascii_lowercase, k=word_len))
        for _ in range(n_words)
    ]

    # Trie construction
    trie = Trie()
    start = time.time()
    for w in words:
        trie.insert(w)
    trie_build = time.time() - start

    # HashSet construction
    start = time.time()
    hash_set = set(words)
    hash_build = time.time() - start

    # Trie search
    start = time.time()
    for w in words[:10000]:
        trie.search(w)
    trie_search = time.time() - start

    # HashSet search
    start = time.time()
    for w in words[:10000]:
        w in hash_set
    hash_search = time.time() - start

    # Trie prefix query (HashSet cannot do this efficiently)
    start = time.time()
    for w in words[:10000]:
        trie.starts_with(w[:3])
    trie_prefix = time.time() - start

    print(f"Building {n_words} words:")
    print(f"  Trie:    {trie_build:.3f}s")
    print(f"  HashSet: {hash_build:.3f}s")
    print(f"Searching 10000 times:")
    print(f"  Trie:    {trie_search:.4f}s")
    print(f"  HashSet: {hash_search:.4f}s")
    print(f"Prefix query 10000 times:")
    print(f"  Trie:    {trie_prefix:.4f}s")
    print(f"  HashSet: N/A (no efficient prefix query support)")

13.21 Summary and Decision Guide

Need prefix matching? ─── Yes ──→ Trie / Patricia Tree
       │
       No
       │
Need multi-pattern matching? ── Yes ──→ AC Automaton
       │
       No
       │
Need substring matching? ── Yes ──→ Suffix Tree / Suffix Array
       │
       No
       │
Only need exact lookup? ── Yes ──→ Hash Table

Key formulas from this chapter:

Trie lookup: O(L), L = string length
Trie space (worst case): O(N * L * sigma), sigma = alphabet size
AC automaton preprocessing: O(m), m = total length of all patterns
AC automaton search: O(n + z), n = text length, z = number of matches
Suffix tree construction: O(n) (Ukkonen's algorithm)
Patricia Tree node count: O(N), N = number of strings

Remember: choosing a data structure is not about "which is better" but about "which better fits your specific requirements." Tries are irreplaceable for prefix-related problems, but if you only need to check whether a string exists in a set, the simplicity and efficiency of hash tables is the wiser choice.

Rate this chapter

4.7 / 5 (26 ratings)