What Is MD5 Hash? How It Works
Introduction to MD5
MD5 (Message Digest Algorithm 5) is a cryptographic hash function designed by Ronald Rivest in 1991, part of the MD algorithm family. MD5 converts input data of any length into a fixed-length (128-bit, or 32 hexadecimal characters) hash value, also called a "digest."
MD5("hello") = 5d41402abc4b2a76b9719d911017c592
MD5("Hello") = 8b1a9953c4611296a827abf8c47804d7
MD5("") = d41d8cd98f00b204e9800998ecf8427e
MD5("hello world") = 5eb63bbbe01eeed093cb22bb8f5acdc3
Core Characteristics of MD5
MD5 possesses the fundamental characteristics of all cryptographic hash functions:
- Deterministic: The same input always produces the same hash value
- Fixed-length output: Whether input is 1 byte or 1 GB, output is always 128 bits (32 hex characters)
- Avalanche effect: A tiny change in input (even one byte) causes a completely different output
- One-way: Theoretically impossible to reverse-derive the original input from the hash (but broken by rainbow tables and other methods)
- Fast computation: MD5 is very fast to compute โ which is actually a weakness in modern security contexts
How MD5 Works
MD5 processes input data through these steps: first pad the input so its length meets certain conditions (typically a multiple of 512 bits minus 64 bits); then append the original input length (64 bits); split the data into 512-bit blocks; finally process each block through a compression function with 4 rounds of 16 steps (using nonlinear functions, modular addition, circular shifts, etc.) to produce 128-bit output.
MD5 Security Issues
MD5 has serious security weaknesses and should no longer be used in any security-sensitive context:
- Collision Attack: In 2004, Wang and Yu demonstrated that MD5 collisions (two different inputs producing the same hash) can be found on ordinary computers; in 2008 researchers used this to forge SSL certificates
- Preimage Attack: For short passwords, MD5 hashes can be quickly reversed via rainbow tables
- Too fast: MD5's high speed makes brute-force cracking very easy โ modern GPUs can calculate tens of billions of MD5 hashes per second
Legitimate Use Cases for MD5
Although MD5 is no longer suitable for security contexts, it remains effective for non-security purposes:
- File integrity verification: Verify that files were not accidentally corrupted during transfer (note: cannot prevent intentional tampering)
- File deduplication: Quickly compare large numbers of files for identical content (like database deduplication, image library deduplication)
- Database indexing/sharding: Evenly distributing data across storage partitions (hash sharding)
- Cache key generation: Quickly generating unique cache identifiers based on input content
MD5 Output Format
MD5 produces a 128-bit (16-byte) hash value, typically expressed as 32 hexadecimal characters (lowercase). Base64-encoded MD5 output (22 characters plus ==) is also sometimes seen, used in HTTP Content-MD5 headers:
/* Standard hex format (32 chars) */
5d41402abc4b2a76b9719d911017c592
/* Base64 format (24 chars) */
XUFAKrxLKna5cZ2REBfFkg==
/* Upper case hex (also valid) */
5D41402ABC4B2A76B9719D911017C592
Calculating MD5 in Various Programming Languages
// JavaScript (Node.js)
const crypto = require('crypto');
const hash = crypto.createHash('md5').update('hello').digest('hex');
// '5d41402abc4b2a76b9719d911017c592'
// Python
import hashlib
hash = hashlib.md5(b'hello').hexdigest()
# '5d41402abc4b2a76b9719d911017c592'
// PHP
$hash = md5('hello');
// '5d41402abc4b2a76b9719d911017c592'
// Go
import "crypto/md5"; import "fmt"
hash := fmt.Sprintf("%x", md5.Sum([]byte("hello")))
Try the free tool now
Use Free Tool โ