Hash Collisions Explained: Why They're Dangerous
โ Back to Blog
Hash Collisions Explained: Why They're Dangerous
ยท 6 min read
Definition of a Hash Collision
A hash collision occurs when two different inputs, when processed by a hash function, produce exactly the same output (hash value). Since hash functions map infinitely many possible inputs to a finite output space, collisions must mathematically exist (pigeonhole principle). The design goal of secure hash functions is to make collisions extremely difficult to find and exploit in practice.
/* A collision means: */
hash(input_A) == hash(input_B)
/* But: */
input_A != input_B
/* Example (conceptual - not actual MD5): */
MD5(data_A) = "d41d8cd98f00b204e9800998ecf8427e"
MD5(data_B) = "d41d8cd98f00b204e9800998ecf8427e"
/* data_A and data_B are different files! */
Three Types of Collision Attacks
- Collision Attack: Find any two different inputs that produce the same hash value. The attacker freely chooses both inputs, only requiring they hash the same. Practical collision attacks have been demonstrated for both MD5 and SHA1.
- Preimage Attack: Given a hash value, find any input that produces that hash. Harder than collision attacks, still infeasible against modern hash functions like SHA256.
- Second Preimage Attack: Given a specific input, find another different input that produces the same hash value. Difficulty between the two above.
MD5 Collision: Real-World Cases
In 2004, researchers including Xiaoyun Wang published a practical MD5 collision algorithm. In 2008, Marc Stevens and others successfully forged valid SSL/TLS certificates using MD5 collisions. The attack: researchers used MD5's weakness allowing collisions to generate two certificate signing requests with identical MD5 hashes โ one legitimate, one containing malicious CA authority. When a CA (using MD5 for verification) signed the legitimate request, that signature was also valid for the malicious version, giving attackers the ability to forge certificates for any website.
SHA1 Collision: The SHAttered Attack
In 2017, Google and CWI Institute announced the SHAttered attack โ the first practical collision attack against SHA1. They generated two PDF files with identical SHA1 hash values but different content. The two PDFs had identical opening bytes but different embedded image content โ visually distinguishable but with completely identical SHA1 hashes.
This attack used approximately 10^18 SHA1 computations, equivalent to 6,500 CPU-years or 110 GPU-years. In comparison, a general birthday attack would theoretically require 2^80 computations; SHAttered reduced the actual computation to approximately 2^63 through cryptanalytic techniques.
Dangerous Scenarios Enabled by Hash Collisions
- Digital signature forgery: If a signing system uses a collision-vulnerable hash, an attacker can get a signature for a benign document, then claim that signature applies to another malicious document with the same hash
- SSL certificate forgery: As demonstrated in the 2008 MD5 certificate forgery attack โ impersonating any HTTPS website
- Software package tampering: Attackers can forge a malicious program's hash to match a legitimate program's, fooling MD5-only integrity checks
How to Defend Against Hash Collision Attacks
- Migrate to SHA256 or stronger hash functions (SHA-3, BLAKE2) for all security operations
- Don't rely on hash verification alone โ combine with digital signatures (GPG/PGP) for authenticity verification
- Reject MD5 and SHA1 in certificates and TLS configurations (modern browsers already reject these by default)
- Use SHA256 or SHA384 in content security (like CSP, SRI โ Subresource Integrity)
Hash Collisions vs. Non-Cryptographic Hash Collisions
In non-cryptographic hash tables, collisions are normal and acceptable โ when two keys hash to the same slot, use chaining or open addressing to handle it. Hash table collisions are a performance issue, not a security issue. However, in web frameworks, if attackers can control large numbers of collisions, they may degrade hash table performance to O(n), causing DoS attacks (HashDoS). Modern languages' built-in dictionary/map types typically use random hash seeds to prevent such attacks.
Try the online tool now โ no installation, completely free.
Open Tool โ
Try the free tool now
Use Free Tool โ