Memory vs Latency

Quick Reference: Step 4: Caching | Throughput vs Latency

Quick Reference

Memory: Cache more data for faster access

Latency: Reduce memory usage for lower latency

Trade-off: More memory = faster access but higher cost

Clear Definition

Memory vs Latency trade-off: Allocating more memory (through caching, pre-loading, or in-memory storage) reduces latency by enabling faster data access, but increases infrastructure costs. Using less memory reduces costs but increases latency due to slower data retrieval from disk or network.

💡 Key Insight: Cache strategically - identify hot data (frequently accessed) and cache it. Balance memory costs with latency requirements based on your SLA and budget constraints.

Core Concepts

The Relationship

More Memory (Caching)
    ↓
Faster Data Access
    ↓
Lower Latency
    ↓
Better User Experience
    ↓
Higher Infrastructure Cost

Memory Usage Patterns

In-Memory Caching:

Store frequently accessed data in RAM
Fast access (nanoseconds vs milliseconds)
Limited by available memory
Examples: Redis, Memcached, application cache

Disk Storage:

Store data on disk
Slower access (milliseconds)
Much larger capacity
Examples: Databases, file systems

Network Storage:

Fetch data over network
Slowest access (hundreds of milliseconds)
Virtually unlimited capacity
Examples: External APIs, remote databases

Caching Strategies

1. Cache-Aside (Lazy Loading)

How it Works:

Check cache first
If miss, fetch from database
Store in cache for future requests

Memory Usage: Moderate (only cache accessed data)

Latency: Low for cached data, high for cache misses

Example:

def get_user(user_id):
    # Check cache
    user = cache.get(f"user:{user_id}")
    if user:
        return user  # Fast path
    
    # Cache miss - fetch from DB
    user = db.get_user(user_id)
    cache.set(f"user:{user_id}", user, ttl=3600)
    return user

2. Write-Through Cache

How it Works:

Write to cache and database simultaneously
Cache always has latest data
Higher memory usage (all writes cached)

Memory Usage: High (cache all written data)

Latency: Very low (always in cache)

3. Write-Behind (Write-Back)

How it Works:

Write to cache immediately
Asynchronously write to database
Batch database writes

Memory Usage: High (buffer writes in memory)

Latency: Very low (immediate cache write)

4. Pre-loading (Eager Loading)

How it Works:

Pre-load data into cache
Predict what will be accessed
Proactive caching

Memory Usage: Very high (cache data before access)

Latency: Very low (data already in cache)

Example:

# Pre-load popular products at startup
popular_products = db.get_popular_products()
for product in popular_products:
    cache.set(f"product:{product.id}", product)

Eviction Policies

LRU (Least Recently Used)

How it Works:

Evict least recently accessed items
Keep recently accessed items

Memory Efficiency: Good (keeps hot data)

Use Case: General purpose caching

LFU (Least Frequently Used)

How it Works:

Evict least frequently accessed items
Keep frequently accessed items

Memory Efficiency: Good (keeps popular data)

Use Case: When access patterns are stable

TTL (Time To Live)

How it Works:

Evict items after expiration time
Simple time-based eviction

Memory Efficiency: Moderate (may keep stale data)

Use Case: When data has natural expiration

FIFO (First In First Out)

How it Works:

Evict oldest items first
Simple queue-based eviction

Memory Efficiency: Poor (may evict hot data)

Use Case: Simple scenarios, not recommended

Decision Framework

Step 1: Identify Hot Data

Questions to Ask:

What data is accessed frequently?
What data is accessed by many users?
What data has high read-to-write ratio?
What data is expensive to compute?

Hot Data Candidates:

User profiles (frequently accessed)
Product catalogs (many users)
Configuration data (high read ratio)
Computed results (expensive to compute)

Step 2: Calculate Memory Requirements

Estimate Memory:

Memory Needed = (Number of Items) × (Size per Item) × (Replication Factor)

Example:

1M user profiles
1KB per profile
2x replication
Total: 1M × 1KB × 2 = 2GB

Step 3: Evaluate Cost vs Benefit

Cost Factors:

Memory cost (RAM is expensive)
Infrastructure cost
Maintenance overhead

Benefit Factors:

Latency reduction
Database load reduction
User experience improvement

Step 4: Choose Caching Strategy

High Memory Budget:

Cache more data
Use write-through
Pre-load frequently accessed data

Low Memory Budget:

Cache only hot data
Use cache-aside
Aggressive eviction policies

Real-World Examples

Example 1: E-commerce Product Catalog

Challenge: Millions of products, frequent access

Solution:

Cache top 10K products in memory
Cache product details for active searches
Use LRU eviction
Memory: ~10GB for 10K products
Latency: 1ms (cache) vs 50ms (database)

Trade-off: 10GB memory cost vs 50x latency improvement

Challenge: Personalized feeds, expensive computation

Solution:

Cache computed feeds for active users
Pre-compute feeds for top users
Use TTL (5 minutes)
Memory: ~100MB per 1K active users
Latency: 5ms (cache) vs 200ms (compute)

Trade-off: Memory for pre-computed feeds vs real-time computation

Example 3: API Rate Limiting

Challenge: Check rate limits for every request

Solution:

Cache rate limit counters in memory
In-memory data structures (Redis)
Very fast lookups
Memory: Minimal (counters are small)
Latency: <1ms (cache) vs 10ms (database)

Trade-off: Minimal memory vs significant latency improvement

Optimization Techniques

1. Selective Caching

Strategy: Only cache data that benefits most

Criteria:

High access frequency
Expensive to compute
Stable (doesn't change often)
Small size (fits in memory)

2. Compression

Strategy: Compress cached data to reduce memory

Benefits:

Store more data in same memory
Reduce memory costs
Slight CPU overhead

Example:

# Compress before caching
compressed = gzip.compress(data)
cache.set(key, compressed)

# Decompress on read
compressed = cache.get(key)
data = gzip.decompress(compressed)

3. Tiered Caching

Strategy: Multiple cache layers with different memory/speed

┌─────────────────────────────────────┐
│  L1: In-Process Cache (Fastest)     │  Small, very fast
│  L2: Redis Cache (Fast)            │  Medium, fast
│  L3: Database (Slowest)             │  Large, slow
└─────────────────────────────────────┘

4. Cache Warming

Strategy: Pre-load cache during low-traffic periods

Benefits:

Reduce cache misses during peak
Better latency during high load
Predictable memory usage

5. Memory-Efficient Data Structures

Strategy: Use compact data structures

Examples:

Bitmaps for boolean flags
Compressed strings
Sparse data structures
Columnar storage for analytics

Monitoring and Metrics

Key Metrics

Memory Metrics:

Cache size (bytes, items)
Memory utilization
Eviction rate
Hit rate

Latency Metrics:

Cache hit latency
Cache miss latency
Average latency
P95, P99 latencies

Efficiency Metrics:

Cache hit ratio (target: >80%)
Memory efficiency (hits per MB)
Cost per request
Latency improvement

Alerting

Alert if memory usage exceeds threshold
Alert if hit rate drops
Alert if latency increases
Monitor eviction patterns

Best Practices

1. Cache Strategically

Identify hot data
Don't cache everything
Focus on high-impact data

2. Set Appropriate TTLs

Short TTL for frequently changing data
Long TTL for stable data
Balance freshness vs memory

3. Monitor Hit Rates

Target: >80% hit rate
Low hit rate = wasted memory
High hit rate = good investment

4. Use Appropriate Eviction

LRU for general purpose
LFU for stable patterns
TTL for time-sensitive data

5. Consider Memory Costs

RAM is expensive
Balance cost vs benefit
Optimize data structures
Use compression when possible

6. Plan for Cache Misses

Don't assume cache will always hit
Design fallback to database
Handle cache failures gracefully

Quick Reference Summary

Memory: More memory (caching) = faster access = lower latency, but higher infrastructure cost.

Latency: Less memory = slower access = higher latency, but lower cost.

Key Strategies:

Cache hot data (frequently accessed)
Use appropriate eviction policies (LRU, LFU, TTL)
Monitor hit rates (target >80%)
Balance memory cost with latency requirements

Best Practice: Cache strategically - identify high-impact data, use tiered caching, monitor efficiency, and balance cost vs benefit.

Remember: Not all data should be cached. Focus on data that provides the best latency improvement per unit of memory.

Previous Topic: SQL vs NoSQL ←

Next Topic: Throughput vs Latency →

Back to: Step 11 Overview | Main Index

# Memory vs Latency

# Quick Reference

# Clear Definition

# Core Concepts

# The Relationship

# Memory Usage Patterns

# Caching Strategies

# 1. Cache-Aside (Lazy Loading)

# 2. Write-Through Cache

# 3. Write-Behind (Write-Back)

# 4. Pre-loading (Eager Loading)

# Eviction Policies

# LRU (Least Recently Used)

# LFU (Least Frequently Used)

# TTL (Time To Live)

# FIFO (First In First Out)

# Decision Framework

# Step 1: Identify Hot Data

# Step 2: Calculate Memory Requirements

# Step 3: Evaluate Cost vs Benefit

# Step 4: Choose Caching Strategy

# Real-World Examples

# Example 1: E-commerce Product Catalog

# Example 2: Social Media Feed

# Example 3: API Rate Limiting

# Optimization Techniques

# 1. Selective Caching

# 2. Compression

# 3. Tiered Caching

# 4. Cache Warming

# 5. Memory-Efficient Data Structures

# Monitoring and Metrics

# Key Metrics

# Alerting

# Best Practices

# 1. Cache Strategically

# 2. Set Appropriate TTLs

# 3. Monitor Hit Rates

# 4. Use Appropriate Eviction

# 5. Consider Memory Costs

# 6. Plan for Cache Misses

# Quick Reference Summary

Memory vs Latency

Quick Reference

Clear Definition

Core Concepts

The Relationship

Memory Usage Patterns

Caching Strategies

1. Cache-Aside (Lazy Loading)

2. Write-Through Cache

3. Write-Behind (Write-Back)

4. Pre-loading (Eager Loading)

Eviction Policies

LRU (Least Recently Used)

LFU (Least Frequently Used)

TTL (Time To Live)

FIFO (First In First Out)

Decision Framework

Step 1: Identify Hot Data

Step 2: Calculate Memory Requirements

Step 3: Evaluate Cost vs Benefit

Step 4: Choose Caching Strategy

Real-World Examples

Example 1: E-commerce Product Catalog

Example 2: Social Media Feed

Example 3: API Rate Limiting

Optimization Techniques

1. Selective Caching

2. Compression

3. Tiered Caching

4. Cache Warming

5. Memory-Efficient Data Structures

Monitoring and Metrics

Key Metrics

Alerting

Best Practices

1. Cache Strategically

2. Set Appropriate TTLs

3. Monitor Hit Rates

4. Use Appropriate Eviction

5. Consider Memory Costs

6. Plan for Cache Misses

Quick Reference Summary