Memory vs Latency
Quick Reference: Step 4: Caching | Throughput vs Latency
Quick Reference
Memory: Cache more data for faster access
Latency: Reduce memory usage for lower latency
Trade-off: More memory = faster access but higher cost
Clear Definition
Memory vs Latency trade-off: Allocating more memory (through caching, pre-loading, or in-memory storage) reduces latency by enabling faster data access, but increases infrastructure costs. Using less memory reduces costs but increases latency due to slower data retrieval from disk or network.
π‘ Key Insight: Cache strategically - identify hot data (frequently accessed) and cache it. Balance memory costs with latency requirements based on your SLA and budget constraints.
Core Concepts
The Relationship
More Memory (Caching)
β
Faster Data Access
β
Lower Latency
β
Better User Experience
β
Higher Infrastructure Cost
Memory Usage Patterns
In-Memory Caching:
- Store frequently accessed data in RAM
- Fast access (nanoseconds vs milliseconds)
- Limited by available memory
- Examples: Redis, Memcached, application cache
Disk Storage:
- Store data on disk
- Slower access (milliseconds)
- Much larger capacity
- Examples: Databases, file systems
Network Storage:
- Fetch data over network
- Slowest access (hundreds of milliseconds)
- Virtually unlimited capacity
- Examples: External APIs, remote databases
Caching Strategies
1. Cache-Aside (Lazy Loading)
How it Works:
- Check cache first
- If miss, fetch from database
- Store in cache for future requests
Memory Usage: Moderate (only cache accessed data)
Latency: Low for cached data, high for cache misses
Example:
def get_user(user_id):
# Check cache
user = cache.get(f"user:{user_id}")
if user:
return user # Fast path
# Cache miss - fetch from DB
user = db.get_user(user_id)
cache.set(f"user:{user_id}", user, ttl=3600)
return user
2. Write-Through Cache
How it Works:
- Write to cache and database simultaneously
- Cache always has latest data
- Higher memory usage (all writes cached)
Memory Usage: High (cache all written data)
Latency: Very low (always in cache)
3. Write-Behind (Write-Back)
How it Works:
- Write to cache immediately
- Asynchronously write to database
- Batch database writes
Memory Usage: High (buffer writes in memory)
Latency: Very low (immediate cache write)
4. Pre-loading (Eager Loading)
How it Works:
- Pre-load data into cache
- Predict what will be accessed
- Proactive caching
Memory Usage: Very high (cache data before access)
Latency: Very low (data already in cache)
Example:
# Pre-load popular products at startup
popular_products = db.get_popular_products()
for product in popular_products:
cache.set(f"product:{product.id}", product)
Eviction Policies
LRU (Least Recently Used)
How it Works:
- Evict least recently accessed items
- Keep recently accessed items
Memory Efficiency: Good (keeps hot data)
Use Case: General purpose caching
LFU (Least Frequently Used)
How it Works:
- Evict least frequently accessed items
- Keep frequently accessed items
Memory Efficiency: Good (keeps popular data)
Use Case: When access patterns are stable
TTL (Time To Live)
How it Works:
- Evict items after expiration time
- Simple time-based eviction
Memory Efficiency: Moderate (may keep stale data)
Use Case: When data has natural expiration
FIFO (First In First Out)
How it Works:
- Evict oldest items first
- Simple queue-based eviction
Memory Efficiency: Poor (may evict hot data)
Use Case: Simple scenarios, not recommended
Decision Framework
Step 1: Identify Hot Data
Questions to Ask:
- What data is accessed frequently?
- What data is accessed by many users?
- What data has high read-to-write ratio?
- What data is expensive to compute?
Hot Data Candidates:
- User profiles (frequently accessed)
- Product catalogs (many users)
- Configuration data (high read ratio)
- Computed results (expensive to compute)
Step 2: Calculate Memory Requirements
Estimate Memory:
Memory Needed = (Number of Items) Γ (Size per Item) Γ (Replication Factor)
Example:
- 1M user profiles
- 1KB per profile
- 2x replication
- Total: 1M Γ 1KB Γ 2 = 2GB
Step 3: Evaluate Cost vs Benefit
Cost Factors:
- Memory cost (RAM is expensive)
- Infrastructure cost
- Maintenance overhead
Benefit Factors:
- Latency reduction
- Database load reduction
- User experience improvement
Step 4: Choose Caching Strategy
High Memory Budget:
- Cache more data
- Use write-through
- Pre-load frequently accessed data
Low Memory Budget:
- Cache only hot data
- Use cache-aside
- Aggressive eviction policies
Real-World Examples
Example 1: E-commerce Product Catalog
Challenge: Millions of products, frequent access
Solution:
- Cache top 10K products in memory
- Cache product details for active searches
- Use LRU eviction
- Memory: ~10GB for 10K products
- Latency: 1ms (cache) vs 50ms (database)
Trade-off: 10GB memory cost vs 50x latency improvement
Example 2: Social Media Feed
Challenge: Personalized feeds, expensive computation
Solution:
- Cache computed feeds for active users
- Pre-compute feeds for top users
- Use TTL (5 minutes)
- Memory: ~100MB per 1K active users
- Latency: 5ms (cache) vs 200ms (compute)
Trade-off: Memory for pre-computed feeds vs real-time computation
Example 3: API Rate Limiting
Challenge: Check rate limits for every request
Solution:
- Cache rate limit counters in memory
- In-memory data structures (Redis)
- Very fast lookups
- Memory: Minimal (counters are small)
- Latency: <1ms (cache) vs 10ms (database)
Trade-off: Minimal memory vs significant latency improvement
Optimization Techniques
1. Selective Caching
Strategy: Only cache data that benefits most
Criteria:
- High access frequency
- Expensive to compute
- Stable (doesn't change often)
- Small size (fits in memory)
2. Compression
Strategy: Compress cached data to reduce memory
Benefits:
- Store more data in same memory
- Reduce memory costs
- Slight CPU overhead
Example:
# Compress before caching
compressed = gzip.compress(data)
cache.set(key, compressed)
# Decompress on read
compressed = cache.get(key)
data = gzip.decompress(compressed)
3. Tiered Caching
Strategy: Multiple cache layers with different memory/speed
βββββββββββββββββββββββββββββββββββββββ
β L1: In-Process Cache (Fastest) β Small, very fast
β L2: Redis Cache (Fast) β Medium, fast
β L3: Database (Slowest) β Large, slow
βββββββββββββββββββββββββββββββββββββββ
4. Cache Warming
Strategy: Pre-load cache during low-traffic periods
Benefits:
- Reduce cache misses during peak
- Better latency during high load
- Predictable memory usage
5. Memory-Efficient Data Structures
Strategy: Use compact data structures
Examples:
- Bitmaps for boolean flags
- Compressed strings
- Sparse data structures
- Columnar storage for analytics
Monitoring and Metrics
Key Metrics
Memory Metrics:
- Cache size (bytes, items)
- Memory utilization
- Eviction rate
- Hit rate
Latency Metrics:
- Cache hit latency
- Cache miss latency
- Average latency
- P95, P99 latencies
Efficiency Metrics:
- Cache hit ratio (target: >80%)
- Memory efficiency (hits per MB)
- Cost per request
- Latency improvement
Alerting
- Alert if memory usage exceeds threshold
- Alert if hit rate drops
- Alert if latency increases
- Monitor eviction patterns
Best Practices
1. Cache Strategically
- Identify hot data
- Don't cache everything
- Focus on high-impact data
2. Set Appropriate TTLs
- Short TTL for frequently changing data
- Long TTL for stable data
- Balance freshness vs memory
3. Monitor Hit Rates
- Target: >80% hit rate
- Low hit rate = wasted memory
- High hit rate = good investment
4. Use Appropriate Eviction
- LRU for general purpose
- LFU for stable patterns
- TTL for time-sensitive data
5. Consider Memory Costs
- RAM is expensive
- Balance cost vs benefit
- Optimize data structures
- Use compression when possible
6. Plan for Cache Misses
- Don't assume cache will always hit
- Design fallback to database
- Handle cache failures gracefully
Quick Reference Summary
Memory: More memory (caching) = faster access = lower latency, but higher infrastructure cost.
Latency: Less memory = slower access = higher latency, but lower cost.
Key Strategies:
- Cache hot data (frequently accessed)
- Use appropriate eviction policies (LRU, LFU, TTL)
- Monitor hit rates (target >80%)
- Balance memory cost with latency requirements
Best Practice: Cache strategically - identify high-impact data, use tiered caching, monitor efficiency, and balance cost vs benefit.
Remember: Not all data should be cached. Focus on data that provides the best latency improvement per unit of memory.
Previous Topic: SQL vs NoSQL β
Next Topic: Throughput vs Latency β
Back to: Step 11 Overview | Main Index