Data Replication & Migration
Quick Reference: Data Partitioning | Sharding | Step 3: Consistency
Quick Reference
| Replication Type | Consistency | Use Case | Complexity |
|---|---|---|---|
| Master-Slave | Eventually consistent | Read scaling, backups | Low |
| Master-Master | Eventually consistent | Multi-region, high availability | High |
| Synchronous | Strong consistency | Critical data | Medium |
| Asynchronous | Eventually consistent | Performance priority | Low |
Clear Definition
Data Replication is the process of copying and maintaining database objects in multiple databases to improve availability, fault tolerance, and read performance. Data Migration is the process of moving data from one system to another, often during system upgrades, database changes, or architecture migrations.
š” Key Insight: Replication improves availability and read performance but introduces consistency challenges. Migration requires careful planning to avoid downtime and data loss.
Core Concepts
Replication Strategies
1. Master-Slave (Primary-Replica) Replication
How it Works:
- One master (primary) handles writes
- Multiple slaves (replicas) receive updates
- Reads can be distributed across replicas
- Failover promotes a replica to master
Characteristics:
- Consistency: Eventually consistent (replication lag)
- Write Performance: Single master (no write conflicts)
- Read Performance: Can scale reads horizontally
- Failover: Requires manual or automatic promotion
Example: MySQL replication, PostgreSQL streaming replication
2. Master-Master (Multi-Master) Replication
How it Works:
- Multiple masters accept writes
- Changes propagate to all masters
- Conflict resolution required
- All nodes can serve reads and writes
Characteristics:
- Consistency: Eventually consistent (conflicts possible)
- Write Performance: Can write to any master
- Availability: Higher availability (no single point of failure)
- Complexity: Conflict resolution, split-brain scenarios
Example: MySQL Group Replication, Cassandra
3. Synchronous Replication
How it Works:
- Write waits for confirmation from replicas
- Strong consistency guarantee
- Higher latency (waits for replicas)
- Lower throughput
Use Cases: Financial systems, critical data
4. Asynchronous Replication
How it Works:
- Write returns immediately
- Replication happens in background
- Lower latency, higher throughput
- Risk of data loss if master fails
Use Cases: Most web applications, high-performance systems
Use Cases
Replication Use Cases
-
High Availability
- Failover to replica if master fails
- Reduced downtime
- Disaster recovery
-
Read Scaling
- Distribute read queries across replicas
- Reduce load on master
- Improve query performance
-
Geographic Distribution
- Place replicas closer to users
- Reduce latency
- Comply with data residency requirements
-
Backup and Recovery
- Use replicas for backups
- Point-in-time recovery
- Testing environments
Migration Use Cases
-
Database Upgrades
- Migrating to newer database version
- Schema changes
- Performance improvements
-
Architecture Changes
- Moving from SQL to NoSQL (or vice versa)
- Changing database vendors
- Implementing sharding
-
Cloud Migration
- Moving from on-premise to cloud
- Changing cloud providers
- Hybrid cloud setups
Advantages & Disadvantages
Replication Advantages
ā
High Availability: Failover capabilities
ā
Read Scaling: Distribute read load
ā
Geographic Distribution: Lower latency for users
ā
Backup: Replicas can serve as backups
ā
Disaster Recovery: Data redundancy
Replication Disadvantages
ā Consistency Challenges: Replication lag, eventual consistency
ā Complexity: Setup and maintenance overhead
ā Storage Cost: Multiple copies of data
ā Write Performance: Master-slave limits write scaling
ā Conflict Resolution: Master-master requires conflict handling
Migration Advantages
ā
Modernization: Move to better technology
ā
Performance: Improve system performance
ā
Cost Reduction: Optimize infrastructure costs
ā
Scalability: Enable better scaling strategies
Migration Disadvantages
ā Downtime Risk: Potential service interruption
ā Data Loss Risk: Possibility of data corruption
ā Complexity: Requires careful planning
ā Cost: Migration tools and resources
Best Practices
Replication Best Practices
-
Monitor Replication Lag
-- MySQL SHOW SLAVE STATUS; -- Check Seconds_Behind_Master -- PostgreSQL SELECT * FROM pg_stat_replication; -
Automated Failover
- Use tools like MySQL MHA, PostgreSQL Patroni
- Test failover procedures regularly
- Monitor replication health
-
Read Distribution
- Route reads to replicas
- Use connection pooling
- Handle replication lag in application
-
Conflict Resolution
- Define conflict resolution strategies
- Use timestamps or vector clocks
- Implement application-level conflict handling
Migration Best Practices
-
Planning Phase
- Analyze source and target schemas
- Estimate data volume and migration time
- Plan for downtime window
- Create rollback plan
-
Testing
- Test migration on staging environment
- Verify data integrity
- Performance testing
- Failover testing
-
Migration Strategies
Big Bang Migration:
- Migrate all data at once
- Requires downtime
- Simpler but risky
Gradual Migration:
- Migrate in phases
- Dual-write during transition
- More complex but safer
Zero-Downtime Migration:
- Use replication during migration
- Switch traffic gradually
- Most complex but no downtime
-
Data Validation
- Compare record counts
- Sample data verification
- Checksum validation
- Application testing
Common Pitfalls
ā ļø Common Mistake: Not monitoring replication lag.
Solution: Set up alerts for replication lag. Handle stale reads in application.
ā ļø Common Mistake: Reading from replica immediately after write.
Solution: Read from master for read-after-write consistency, or accept eventual consistency.
ā ļø Common Mistake: Not testing failover procedures.
Solution: Regularly test failover in staging. Document procedures.
ā ļø Common Mistake: Migrating without backup.
Solution: Always backup before migration. Test restore procedures.
ā ļø Common Mistake: Not planning for rollback.
Solution: Have rollback plan ready. Test rollback procedures.
Interview Tips
šÆ Interview Focus: Interviewers ask about replication strategies and consistency:
- Replication Types: Know master-slave vs master-master
- Consistency: Understand replication lag and eventual consistency
- Failover: Explain failover procedures
- Migration: Discuss migration strategies and challenges
- Trade-offs: Consistency vs availability, performance vs durability
Common Questions
- "How would you set up replication for a high-traffic application?"
- "How do you handle replication lag?"
- "Explain master-master replication and conflict resolution."
- "How would you migrate a database with zero downtime?"
- "What's the difference between synchronous and asynchronous replication?"
Related Topics
- Data Partitioning: Partitioning strategies
- Sharding: Data distribution
- Step 3: CAP Theorem: Consistency vs availability
- Step 3: Consistency Levels: Consistency models
Visual Aids
Master-Slave Replication
āāāāāāāāāāā
ā Master ā
ā (Writes)ā
āāāāāā¬āāāāā
ā
āāāāāā“āāāāā
ā ā
āāāāāā¼āāāā āāāā¼āāāāā
ā Slave 1ā āSlave 2ā
ā(Reads) ā ā(Reads)ā
āāāāāāāāāā āāāāāāāāā
Master-Master Replication
āāāāāāāāāāā āāāāāāāāāāā
ā Master 1āāāāāāāŗā Master 2ā
ā(Writes) ā ā(Writes) ā
āāāāāāāāāāā āāāāāāāāāāā
ā ā
āāāāāāāā¬āāāāāāāāāā
ā
āāāāāāā¼āāāāāā
ā Replicas ā
ā (Reads) ā
āāāāāāāāāāāāā
Migration Strategy
Phase 1: Setup
Source DB āāāŗ Replication āāāŗ Target DB
Phase 2: Dual Write
App āāāŗ Source DB
App āāāŗ Target DB
Phase 3: Validation
Compare data between Source and Target
Phase 4: Cutover
App āāāŗ Target DB (only)
Quick Reference Summary
Replication: Copy data across multiple databases for availability, read scaling, and disaster recovery. Master-slave for read scaling, master-master for high availability.
Migration: Move data between systems. Plan carefully, test thoroughly, have rollback plan. Use gradual migration for zero downtime.
Key Consideration: Balance consistency (synchronous) vs performance (asynchronous). Monitor replication lag and plan for failover.
Previous Topic: In-Memory Databases ā
Next Topic: Data Partitioning ā
Back to: Step 2 Overview | Main Index