Cryptographic Hash Functions: Complete Guide
โ Back to Blog
Cryptographic Hash Functions: Complete Guide
ยท 8 min read
What Is a Cryptographic Hash Function
A cryptographic hash function is a deterministic function mapping arbitrary-length inputs to fixed-length outputs, designed so that reverse computation and collision finding are computationally infeasible. The "cryptographic" qualifier is important โ it distinguishes secure hashes (like SHA-256) from ordinary hash functions (like CRC32, MurmurHash). The latter only optimize for speed and uniform distribution, with no security guarantees. Cryptographic hash functions are the foundational primitive of modern cryptography: SSL/TLS, digital signatures, blockchain, password storage, and virtually all security systems depend on them.
Five Core Security Properties
The security of a cryptographic hash function is defined by five properties โ these are not optional features but mandatory requirements for a secure hash function:
- Determinism: The same input always produces the same output. This is the most basic requirement โ without it, hash functions cannot be used for verification
- Preimage Resistance: Given a hash value h, it is computationally infeasible to find an input m such that Hash(m) = h. This makes hashing irreversible
- Second Preimage Resistance: Given an input m1, it is computationally infeasible to find m2 (m2 โ m1) such that Hash(m1) = Hash(m2). Prevents attackers from creating a different document with the same hash
- Collision Resistance: It is computationally infeasible to find any two different inputs m1 and m2 such that Hash(m1) = Hash(m2). Stronger than second preimage resistance โ the attacker is free to choose both inputs
- Avalanche Effect: A tiny change in input (even flipping a single bit) causes dramatic changes in output (approximately 50% of bits flip). Prevents inferring original data by analyzing input/output differences
/* Avalanche effect demonstration */
SHA256("hello")
= 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
SHA256("hellp") // just last char 'o'โ'p'
= 0b3a1ef5be7c4a56034cd3aea3d7daf7bfa0b6ff9ac6da4c3a6e28cce1ea6c17
// ~50% of bits flipped โ completely different output
History and Evolution of Major Hash Algorithms
The history of cryptographic hash functions is a continuous cycle of attacks and evolution:
- MD5 (1992, Ron Rivest): 128-bit output, once widely used. In 2004, Wang Xiaoyun and colleagues found practical collisions; in 2008, researchers used MD5 collisions to forge CA certificates. Now completely insecure
- SHA-1 (1995, NSA/NIST): 160-bit output, once the internet standard. In 2017, Google's SHAttered attack found a practical collision. Modern browsers and CAs have deprecated it
- SHA-2 family (2001, NSA/NIST): Includes SHA-224/256/384/512, still considered secure today. SHA-256 is currently the most widely used cryptographic hash algorithm
- SHA-3 (2015, NIST): Based on Keccak algorithm, uses a new Sponge Construction with a completely different internal structure from SHA-2 โ designed as a backup if SHA-2 is ever broken
- BLAKE3 (2020): A modern high-speed hash function, 5โ10x faster than SHA-256, equivalent security level, supports parallelization โ ideal for high-performance scenarios
SHA-2 Internal Workings (Merkle-Damgรฅrd Construction)
SHA-256 uses the Merkle-Damgรฅrd construction: the message of arbitrary length is split into 512-bit (64-byte) blocks; each block is mixed with the previous block's output (the chaining value) through a compression function; the final block's output is the hash value. The compression function internally uses 64 rounds of bitwise mixing operations (rotations, shifts, XOR, AND, etc.), ensuring each input bit influences all output bits. The initial chaining values (IV) are pre-determined constants (fractional parts of the square roots of the first 8 prime numbers in binary), ensuring no backdoors exist in the algorithm.
/* SHA-256 processing overview */
Input: "Hello, World!"
Step 1: Padding
Append 1 bit, then 0 bits, then 64-bit length
โ Makes total length โก 448 (mod 512)
Step 2: Parse into 512-bit blocks
Block[0] = first 64 bytes of padded message
Step 3: Initialize hash state (H0โH7) with IV constants
H0 = 0x6a09e667 // frac(sqrt(2))
H1 = 0xbb67ae85 // frac(sqrt(3))
... (8 constants total)
Step 4: 64-round compression per block
for i in 0..63:
// Mix using Ch, Maj, Sigma functions + round constant
// Each round: H7โH6โH5โH4โH3โH2โH1โH0 with mixing
Step 5: Add compressed block to running hash state
Step 6: Output H0||H1||...||H7 as 256-bit final hash
SHA-3's Sponge Construction: A Completely Different Design
SHA-3 (Keccak) uses a completely different "Sponge Construction" from SHA-2 โ one reason NIST chose it as the SHA-3 standard is that both algorithms' security rests on different mathematical assumptions, meaning there are no shared weaknesses. The sponge construction has two phases: the Absorbing phase (input data is XORed into the state matrix in chunks) and the Squeezing phase (output bits are extracted from the state matrix). SHA-3's internal state is a 1600-bit 5ร5ร64 matrix mixed through the Keccak-f permutation โ fundamentally different from SHA-2's linear chaining structure, naturally supporting variable-length output (SHAKE128/SHAKE256 are based on this).
Use Cases and Algorithm Selection
Different scenarios have different hash function requirements. Here are 2025 selection recommendations:
- Password storage: Do not use any general-purpose hash function. Use Argon2id (preferred), bcrypt, or PBKDF2 (in priority order)
- Digital signatures and certificates: SHA-256 or SHA-384 (RSA-2048 with SHA-256, RSA-4096 or ECDSA with SHA-384 or stronger)
- File integrity verification: SHA-256 is the best general choice; if only protecting against accidental corruption (non-adversarial), MD5 remains acceptable
- HMAC message authentication: HMAC-SHA256 or HMAC-SHA512
- Blockchain and Merkle trees: SHA-256 (used by Bitcoin) or Keccak-256 (used by Ethereum)
- High-performance non-security scenarios (cache keys, data deduplication): BLAKE3 or xxHash โ far faster than SHA-256 with good distribution
// 2025 algorithm selection cheatsheet
const selection = {
passwords: "Argon2id > bcrypt > PBKDF2 (NEVER SHA/MD5)",
digitalSignatures: "SHA-256 (minimum), SHA-384 preferred",
fileIntegrity: "SHA-256",
hmac: "HMAC-SHA256 or HMAC-SHA512",
blockchain: "SHA-256 (Bitcoin), Keccak-256 (Ethereum)",
highPerformance: "BLAKE3 or xxHash",
deprecated: ["MD5", "SHA-1"] // avoid for new security-sensitive code
};
Security Boundaries: Output Length and Collision Resistance
Based on the Birthday Paradox, the expected computational work to find a collision is approximately 2^(n/2), where n is the output bit length. This means: MD5 (128 bits): ~2^64 operations to expect a collision; SHA-1 (160 bits): ~2^80 operations; SHA-256 (256 bits): ~2^128 operations. 2^128 far exceeds the total computational power of all human computers under current technology, which is why SHA-256 is considered secure for the foreseeable future โ including the post-quantum era, where quantum computers can only halve the collision difficulty to ~2^64, which remains secure.
/* Security strength comparison */
Algorithm | Output | Collision Resistance | Status (2025)
-----------|---------|---------------------|---------------
MD5 | 128-bit | ~2^64 (broken) | DEPRECATED
SHA-1 | 160-bit | ~2^80 (broken) | DEPRECATED
SHA-256 | 256-bit | ~2^128 | SECURE
SHA-384 | 384-bit | ~2^192 | SECURE
SHA-512 | 512-bit | ~2^256 | SECURE
SHA3-256 | 256-bit | ~2^128 | SECURE
BLAKE3 | 256-bit | ~2^128 | SECURE
/* Note: MD5 and SHA-1 have been PRACTICALLY broken
(real collisions demonstrated, not just theoretical) */
Practical Code: Multi-Language Hash Implementations
# Python - using hashlib (built-in)
import hashlib
data = b"Hello, World!"
print(hashlib.md5(data).hexdigest()) # 65a8e27d8879283831b664bd8b7f0ad4
print(hashlib.sha1(data).hexdigest()) # 0a0a9f2a6772942557ab5355d76af442f8f65e01
print(hashlib.sha256(data).hexdigest()) # dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986d
print(hashlib.sha512(data).hexdigest()) # 374d794a95cdcfd8b35993185fef9ba368f160d8daf432d08ba9f1ed1e5abe6...
# Large file hashing (memory efficient)
def hash_file(filepath, algorithm='sha256'):
h = hashlib.new(algorithm)
with open(filepath, 'rb') as f:
while chunk := f.read(65536):
h.update(chunk)
return h.hexdigest()
// JavaScript (Node.js) - crypto module (built-in)
const crypto = require('crypto');
const data = 'Hello, World!';
['md5', 'sha1', 'sha256', 'sha512'].forEach(alg => {
const hash = crypto.createHash(alg).update(data).digest('hex');
console.log(`${alg}: ${hash}`);
});
# Go - crypto/* packages (built-in)
import (
"crypto/md5"
"crypto/sha256"
"fmt"
)
data := []byte("Hello, World!")
fmt.Printf("MD5: %x\n", md5.Sum(data))
fmt.Printf("SHA256: %x\n", sha256.Sum256(data))
Try the online tool now โ no installation, completely free.
Open Tool โ
Try the free tool now
Use Free Tool โ