Data Replication & Migration

Quick Reference: Data Partitioning | Sharding | Step 3: Consistency


Quick Reference

Replication TypeConsistencyUse CaseComplexity
Master-SlaveEventually consistentRead scaling, backupsLow
Master-MasterEventually consistentMulti-region, high availabilityHigh
SynchronousStrong consistencyCritical dataMedium
AsynchronousEventually consistentPerformance priorityLow

Clear Definition

Data Replication is the process of copying and maintaining database objects in multiple databases to improve availability, fault tolerance, and read performance. Data Migration is the process of moving data from one system to another, often during system upgrades, database changes, or architecture migrations.

šŸ’” Key Insight: Replication improves availability and read performance but introduces consistency challenges. Migration requires careful planning to avoid downtime and data loss.


Core Concepts

Replication Strategies

1. Master-Slave (Primary-Replica) Replication

How it Works:

  • One master (primary) handles writes
  • Multiple slaves (replicas) receive updates
  • Reads can be distributed across replicas
  • Failover promotes a replica to master

Characteristics:

  • Consistency: Eventually consistent (replication lag)
  • Write Performance: Single master (no write conflicts)
  • Read Performance: Can scale reads horizontally
  • Failover: Requires manual or automatic promotion

Example: MySQL replication, PostgreSQL streaming replication

2. Master-Master (Multi-Master) Replication

How it Works:

  • Multiple masters accept writes
  • Changes propagate to all masters
  • Conflict resolution required
  • All nodes can serve reads and writes

Characteristics:

  • Consistency: Eventually consistent (conflicts possible)
  • Write Performance: Can write to any master
  • Availability: Higher availability (no single point of failure)
  • Complexity: Conflict resolution, split-brain scenarios

Example: MySQL Group Replication, Cassandra

3. Synchronous Replication

How it Works:

  • Write waits for confirmation from replicas
  • Strong consistency guarantee
  • Higher latency (waits for replicas)
  • Lower throughput

Use Cases: Financial systems, critical data

4. Asynchronous Replication

How it Works:

  • Write returns immediately
  • Replication happens in background
  • Lower latency, higher throughput
  • Risk of data loss if master fails

Use Cases: Most web applications, high-performance systems


Use Cases

Replication Use Cases

  1. High Availability

    • Failover to replica if master fails
    • Reduced downtime
    • Disaster recovery
  2. Read Scaling

    • Distribute read queries across replicas
    • Reduce load on master
    • Improve query performance
  3. Geographic Distribution

    • Place replicas closer to users
    • Reduce latency
    • Comply with data residency requirements
  4. Backup and Recovery

    • Use replicas for backups
    • Point-in-time recovery
    • Testing environments

Migration Use Cases

  1. Database Upgrades

    • Migrating to newer database version
    • Schema changes
    • Performance improvements
  2. Architecture Changes

    • Moving from SQL to NoSQL (or vice versa)
    • Changing database vendors
    • Implementing sharding
  3. Cloud Migration

    • Moving from on-premise to cloud
    • Changing cloud providers
    • Hybrid cloud setups

Advantages & Disadvantages

Replication Advantages

āœ… High Availability: Failover capabilities
āœ… Read Scaling: Distribute read load
āœ… Geographic Distribution: Lower latency for users
āœ… Backup: Replicas can serve as backups
āœ… Disaster Recovery: Data redundancy

Replication Disadvantages

āŒ Consistency Challenges: Replication lag, eventual consistency
āŒ Complexity: Setup and maintenance overhead
āŒ Storage Cost: Multiple copies of data
āŒ Write Performance: Master-slave limits write scaling
āŒ Conflict Resolution: Master-master requires conflict handling

Migration Advantages

āœ… Modernization: Move to better technology
āœ… Performance: Improve system performance
āœ… Cost Reduction: Optimize infrastructure costs
āœ… Scalability: Enable better scaling strategies

Migration Disadvantages

āŒ Downtime Risk: Potential service interruption
āŒ Data Loss Risk: Possibility of data corruption
āŒ Complexity: Requires careful planning
āŒ Cost: Migration tools and resources


Best Practices

Replication Best Practices

  1. Monitor Replication Lag

    -- MySQL
    SHOW SLAVE STATUS;
    -- Check Seconds_Behind_Master
    
    -- PostgreSQL
    SELECT * FROM pg_stat_replication;
    
  2. Automated Failover

    • Use tools like MySQL MHA, PostgreSQL Patroni
    • Test failover procedures regularly
    • Monitor replication health
  3. Read Distribution

    • Route reads to replicas
    • Use connection pooling
    • Handle replication lag in application
  4. Conflict Resolution

    • Define conflict resolution strategies
    • Use timestamps or vector clocks
    • Implement application-level conflict handling

Migration Best Practices

  1. Planning Phase

    • Analyze source and target schemas
    • Estimate data volume and migration time
    • Plan for downtime window
    • Create rollback plan
  2. Testing

    • Test migration on staging environment
    • Verify data integrity
    • Performance testing
    • Failover testing
  3. Migration Strategies

    Big Bang Migration:

    • Migrate all data at once
    • Requires downtime
    • Simpler but risky

    Gradual Migration:

    • Migrate in phases
    • Dual-write during transition
    • More complex but safer

    Zero-Downtime Migration:

    • Use replication during migration
    • Switch traffic gradually
    • Most complex but no downtime
  4. Data Validation

    • Compare record counts
    • Sample data verification
    • Checksum validation
    • Application testing

Common Pitfalls

āš ļø Common Mistake: Not monitoring replication lag.

Solution: Set up alerts for replication lag. Handle stale reads in application.

āš ļø Common Mistake: Reading from replica immediately after write.

Solution: Read from master for read-after-write consistency, or accept eventual consistency.

āš ļø Common Mistake: Not testing failover procedures.

Solution: Regularly test failover in staging. Document procedures.

āš ļø Common Mistake: Migrating without backup.

Solution: Always backup before migration. Test restore procedures.

āš ļø Common Mistake: Not planning for rollback.

Solution: Have rollback plan ready. Test rollback procedures.


Interview Tips

šŸŽÆ Interview Focus: Interviewers ask about replication strategies and consistency:

  1. Replication Types: Know master-slave vs master-master
  2. Consistency: Understand replication lag and eventual consistency
  3. Failover: Explain failover procedures
  4. Migration: Discuss migration strategies and challenges
  5. Trade-offs: Consistency vs availability, performance vs durability

Common Questions

  • "How would you set up replication for a high-traffic application?"
  • "How do you handle replication lag?"
  • "Explain master-master replication and conflict resolution."
  • "How would you migrate a database with zero downtime?"
  • "What's the difference between synchronous and asynchronous replication?"


Visual Aids

Master-Slave Replication

     ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
     │  Master │
     │ (Writes)│
     ā””ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”˜
          │
     ā”Œā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”
     │         │
ā”Œā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā” ā”Œā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”
│ Slave 1│ │Slave 2│
│(Reads) │ │(Reads)│
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Master-Master Replication

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”      ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ Master 1│◄────►│ Master 2│
│(Writes) │      │(Writes) │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜      ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
     │                │
     ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
            │
      ā”Œā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”
      │  Replicas │
      │  (Reads)  │
      ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Migration Strategy

Phase 1: Setup
Source DB ──► Replication ──► Target DB

Phase 2: Dual Write
App ──► Source DB
App ──► Target DB

Phase 3: Validation
Compare data between Source and Target

Phase 4: Cutover
App ──► Target DB (only)

Quick Reference Summary

Replication: Copy data across multiple databases for availability, read scaling, and disaster recovery. Master-slave for read scaling, master-master for high availability.

Migration: Move data between systems. Plan carefully, test thoroughly, have rollback plan. Use gradual migration for zero downtime.

Key Consideration: Balance consistency (synchronous) vs performance (asynchronous). Monitor replication lag and plan for failover.


Previous Topic: In-Memory Databases ←

Next Topic: Data Partitioning →

Back to: Step 2 Overview | Main Index