Database Sharding in Practice
MySQL Sharding Complete Guide
Sharding is MySQL's ultimate horizontal scalability solution and one of the most complex architectural decisions. This chapter covers when to shard, shard key design, middleware selection, cross-shard queries, distributed transactions, and zero-downtime data migration.
Architecture principle: Sharding is a last resort, not a first choice. Before sharding, exhaust these options: index optimization, read/write splitting, caching, and vertical database splitting (microservices). Premature sharding introduces catastrophic complexity.
1. When to Shard?
1.1 Single-node MySQL limits
A high-end server (32-core, 256GB RAM, NVMe SSD) can handle roughly:
| Metric | Rough Limit | Notes |
|---|---|---|
| QPS (simple queries) | ~50,000 | Indexed point lookups |
| Write TPS | ~5,000–10,000 | fsync'd transaction writes |
| Rows per table | 100M–500M | Watch query performance above 20M |
| Data per instance | 1–5 TB | Degrades once data exceeds Buffer Pool |
1.2 Shard decision tree
Table > 20M rows? ├── No → optimize indexes first └── Yes → Are queries slow? ├── No → keep watching (50M may still be fine) └── Yes → Have you tried: ├── Better indexes / query rewriting → do first ├── Larger Buffer Pool → do first ├── Read/write splitting (replica) → do first ├── Vertical split (microservices DB) → do first └── All done and still not enough → consider sharding
2. Split Types
2.1 Vertical splitting (do this first)
Move different business tables to different databases — typically mirrors microservice boundaries. Reduces coupling, no complex shard routing needed.
2.2 Horizontal sharding
Distribute rows of the same table across multiple databases/tables using a shard key. Each shard holds a partition of the data.
3. Shard Key Design
3.1 Selection principles
- High query routing rate: 80%+ of queries must carry this field, or you'll scatter-gather every query
- Uniform data distribution: avoid hotspots (one shard much larger than others)
- Natural business clustering: keep one user's data on one shard to enable single-shard joins
- Immutable: changing a shard key value means migrating data across shards — extremely costly
3.2 Sharding strategy comparison
| Strategy | Pros | Cons |
|---|---|---|
| Hash modulo | Even distribution, simple | Re-sharding requires rebalancing all data |
| Consistent hashing | Only migrate ~1/N data when adding shards | Complex, potential imbalance |
| Range | Easy to add new shards | Hotspot on latest range |
| Time range | Natural archival | Write hotspot on current period |
| Lookup table | Flexible routing | Need to maintain mapping table |
4. Global Unique IDs
AUTO_INCREMENT is only unique within a single database. Options:
- Snowflake ID: 64-bit integer — timestamp (41b) + machine ID (10b) + sequence (12b). Trend-monotonic (InnoDB-friendly), 4M+ IDs/sec, but requires clock sync.
- UUID v7: Time-ordered UUID, 128-bit, avoids clock issues, larger than Snowflake
- Segment allocation (Leaf): Database-backed ID segments, simple and reliable
5. Middleware Selection
| Scenario | Recommendation |
|---|---|
| Java stack, mid-scale | ShardingSphere JDBC mode |
| Multi-language, proxy needed | ShardingSphere Proxy |
| Kubernetes, large-scale | Vitess |
| Cloud SaaS | PlanetScale (managed Vitess) |
| Go, simple routing | Custom per-shard connection pool routing |
6. Cross-Shard Query Challenges
- Cross-shard JOIN: Impossible at DB layer if shard keys differ; must merge in application layer
- ORDER BY + LIMIT across shards: Gather top N from each shard, merge-sort, take final N
- Aggregations: Aggregate per shard, combine results in app
- Recommended pattern: Use Elasticsearch (synced via CDC) for complex multi-dimensional queries; MySQL for storage and PK lookups
7. Distributed Transactions
| Approach | Consistency | Performance | Recommended For |
|---|---|---|---|
| XA 2-Phase Commit | Strong | Low | Finance, strict consistency |
| Seata AT | Eventual | Medium | Java stack, low intrusion |
| TCC | Eventual | High | High concurrency with compensation |
| Outbox Pattern | Eventual | High | Most practical |
8. Zero-Downtime Migration
Dual-write migration (recommended):
- Build new sharded cluster
- Enable dual-write (write to both old and new, read from old)
- Backfill historical data in batches (rate-limited to avoid impacting production)
- Validate consistency between old and new
- Gradually shift read traffic to new cluster, monitor
- Atomically switch write traffic; keep old DB read-only as rollback option
- Decommission old DB after stability confirmed