Memory vs Latency

Quick Reference: Step 4: Caching | Throughput vs Latency


Quick Reference

Memory: Cache more data for faster access

Latency: Reduce memory usage for lower latency

Trade-off: More memory = faster access but higher cost


Clear Definition

Memory vs Latency trade-off: Allocating more memory (through caching, pre-loading, or in-memory storage) reduces latency by enabling faster data access, but increases infrastructure costs. Using less memory reduces costs but increases latency due to slower data retrieval from disk or network.

πŸ’‘ Key Insight: Cache strategically - identify hot data (frequently accessed) and cache it. Balance memory costs with latency requirements based on your SLA and budget constraints.


Core Concepts

The Relationship

More Memory (Caching)
    ↓
Faster Data Access
    ↓
Lower Latency
    ↓
Better User Experience
    ↓
Higher Infrastructure Cost

Memory Usage Patterns

In-Memory Caching:

  • Store frequently accessed data in RAM
  • Fast access (nanoseconds vs milliseconds)
  • Limited by available memory
  • Examples: Redis, Memcached, application cache

Disk Storage:

  • Store data on disk
  • Slower access (milliseconds)
  • Much larger capacity
  • Examples: Databases, file systems

Network Storage:

  • Fetch data over network
  • Slowest access (hundreds of milliseconds)
  • Virtually unlimited capacity
  • Examples: External APIs, remote databases

Caching Strategies

1. Cache-Aside (Lazy Loading)

How it Works:

  1. Check cache first
  2. If miss, fetch from database
  3. Store in cache for future requests

Memory Usage: Moderate (only cache accessed data)

Latency: Low for cached data, high for cache misses

Example:

def get_user(user_id):
    # Check cache
    user = cache.get(f"user:{user_id}")
    if user:
        return user  # Fast path
    
    # Cache miss - fetch from DB
    user = db.get_user(user_id)
    cache.set(f"user:{user_id}", user, ttl=3600)
    return user

2. Write-Through Cache

How it Works:

  1. Write to cache and database simultaneously
  2. Cache always has latest data
  3. Higher memory usage (all writes cached)

Memory Usage: High (cache all written data)

Latency: Very low (always in cache)

3. Write-Behind (Write-Back)

How it Works:

  1. Write to cache immediately
  2. Asynchronously write to database
  3. Batch database writes

Memory Usage: High (buffer writes in memory)

Latency: Very low (immediate cache write)

4. Pre-loading (Eager Loading)

How it Works:

  1. Pre-load data into cache
  2. Predict what will be accessed
  3. Proactive caching

Memory Usage: Very high (cache data before access)

Latency: Very low (data already in cache)

Example:

# Pre-load popular products at startup
popular_products = db.get_popular_products()
for product in popular_products:
    cache.set(f"product:{product.id}", product)

Eviction Policies

LRU (Least Recently Used)

How it Works:

  • Evict least recently accessed items
  • Keep recently accessed items

Memory Efficiency: Good (keeps hot data)

Use Case: General purpose caching

LFU (Least Frequently Used)

How it Works:

  • Evict least frequently accessed items
  • Keep frequently accessed items

Memory Efficiency: Good (keeps popular data)

Use Case: When access patterns are stable

TTL (Time To Live)

How it Works:

  • Evict items after expiration time
  • Simple time-based eviction

Memory Efficiency: Moderate (may keep stale data)

Use Case: When data has natural expiration

FIFO (First In First Out)

How it Works:

  • Evict oldest items first
  • Simple queue-based eviction

Memory Efficiency: Poor (may evict hot data)

Use Case: Simple scenarios, not recommended


Decision Framework

Step 1: Identify Hot Data

Questions to Ask:

  • What data is accessed frequently?
  • What data is accessed by many users?
  • What data has high read-to-write ratio?
  • What data is expensive to compute?

Hot Data Candidates:

  • User profiles (frequently accessed)
  • Product catalogs (many users)
  • Configuration data (high read ratio)
  • Computed results (expensive to compute)

Step 2: Calculate Memory Requirements

Estimate Memory:

Memory Needed = (Number of Items) Γ— (Size per Item) Γ— (Replication Factor)

Example:

  • 1M user profiles
  • 1KB per profile
  • 2x replication
  • Total: 1M Γ— 1KB Γ— 2 = 2GB

Step 3: Evaluate Cost vs Benefit

Cost Factors:

  • Memory cost (RAM is expensive)
  • Infrastructure cost
  • Maintenance overhead

Benefit Factors:

  • Latency reduction
  • Database load reduction
  • User experience improvement

Step 4: Choose Caching Strategy

High Memory Budget:

  • Cache more data
  • Use write-through
  • Pre-load frequently accessed data

Low Memory Budget:

  • Cache only hot data
  • Use cache-aside
  • Aggressive eviction policies

Real-World Examples

Example 1: E-commerce Product Catalog

Challenge: Millions of products, frequent access

Solution:

  • Cache top 10K products in memory
  • Cache product details for active searches
  • Use LRU eviction
  • Memory: ~10GB for 10K products
  • Latency: 1ms (cache) vs 50ms (database)

Trade-off: 10GB memory cost vs 50x latency improvement

Example 2: Social Media Feed

Challenge: Personalized feeds, expensive computation

Solution:

  • Cache computed feeds for active users
  • Pre-compute feeds for top users
  • Use TTL (5 minutes)
  • Memory: ~100MB per 1K active users
  • Latency: 5ms (cache) vs 200ms (compute)

Trade-off: Memory for pre-computed feeds vs real-time computation

Example 3: API Rate Limiting

Challenge: Check rate limits for every request

Solution:

  • Cache rate limit counters in memory
  • In-memory data structures (Redis)
  • Very fast lookups
  • Memory: Minimal (counters are small)
  • Latency: <1ms (cache) vs 10ms (database)

Trade-off: Minimal memory vs significant latency improvement


Optimization Techniques

1. Selective Caching

Strategy: Only cache data that benefits most

Criteria:

  • High access frequency
  • Expensive to compute
  • Stable (doesn't change often)
  • Small size (fits in memory)

2. Compression

Strategy: Compress cached data to reduce memory

Benefits:

  • Store more data in same memory
  • Reduce memory costs
  • Slight CPU overhead

Example:

# Compress before caching
compressed = gzip.compress(data)
cache.set(key, compressed)

# Decompress on read
compressed = cache.get(key)
data = gzip.decompress(compressed)

3. Tiered Caching

Strategy: Multiple cache layers with different memory/speed

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  L1: In-Process Cache (Fastest)     β”‚  Small, very fast
β”‚  L2: Redis Cache (Fast)            β”‚  Medium, fast
β”‚  L3: Database (Slowest)             β”‚  Large, slow
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4. Cache Warming

Strategy: Pre-load cache during low-traffic periods

Benefits:

  • Reduce cache misses during peak
  • Better latency during high load
  • Predictable memory usage

5. Memory-Efficient Data Structures

Strategy: Use compact data structures

Examples:

  • Bitmaps for boolean flags
  • Compressed strings
  • Sparse data structures
  • Columnar storage for analytics

Monitoring and Metrics

Key Metrics

Memory Metrics:

  • Cache size (bytes, items)
  • Memory utilization
  • Eviction rate
  • Hit rate

Latency Metrics:

  • Cache hit latency
  • Cache miss latency
  • Average latency
  • P95, P99 latencies

Efficiency Metrics:

  • Cache hit ratio (target: >80%)
  • Memory efficiency (hits per MB)
  • Cost per request
  • Latency improvement

Alerting

  • Alert if memory usage exceeds threshold
  • Alert if hit rate drops
  • Alert if latency increases
  • Monitor eviction patterns

Best Practices

1. Cache Strategically

  • Identify hot data
  • Don't cache everything
  • Focus on high-impact data

2. Set Appropriate TTLs

  • Short TTL for frequently changing data
  • Long TTL for stable data
  • Balance freshness vs memory

3. Monitor Hit Rates

  • Target: >80% hit rate
  • Low hit rate = wasted memory
  • High hit rate = good investment

4. Use Appropriate Eviction

  • LRU for general purpose
  • LFU for stable patterns
  • TTL for time-sensitive data

5. Consider Memory Costs

  • RAM is expensive
  • Balance cost vs benefit
  • Optimize data structures
  • Use compression when possible

6. Plan for Cache Misses

  • Don't assume cache will always hit
  • Design fallback to database
  • Handle cache failures gracefully

Quick Reference Summary

Memory: More memory (caching) = faster access = lower latency, but higher infrastructure cost.

Latency: Less memory = slower access = higher latency, but lower cost.

Key Strategies:

  • Cache hot data (frequently accessed)
  • Use appropriate eviction policies (LRU, LFU, TTL)
  • Monitor hit rates (target >80%)
  • Balance memory cost with latency requirements

Best Practice: Cache strategically - identify high-impact data, use tiered caching, monitor efficiency, and balance cost vs benefit.

Remember: Not all data should be cached. Focus on data that provides the best latency improvement per unit of memory.


Previous Topic: SQL vs NoSQL ←

Next Topic: Throughput vs Latency β†’

Back to: Step 11 Overview | Main Index