Throughput vs Latency
Quick Reference: Memory vs Latency | Accuracy vs Latency
Quick Reference
Throughput: Process more requests per second
Latency: Lower response time per request
Trade-off: Batching increases throughput but increases latency
Clear Definition
Throughput vs Latency is a fundamental performance trade-off in system design. Batching and optimization techniques increase throughput (requests per second) but may increase latency (time per request). Real-time processing reduces latency but may reduce overall throughput.
π‘ Key Insight: Batch for high throughput, process immediately for low latency. Most systems use a hybrid approach - batch non-critical operations, process critical ones in real-time.
Core Concepts
Throughput (Requests Per Second)
Definition: The number of requests a system can process in a given time period.
Metrics:
- Requests per second (RPS)
- Transactions per second (TPS)
- Operations per second (OPS)
Factors Affecting Throughput:
- Batching operations
- Connection pooling
- Parallel processing
- Resource utilization
- Network bandwidth
Latency (Response Time)
Definition: The time taken to process a single request from start to finish.
Metrics:
- P50 (median latency)
- P95 (95th percentile)
- P99 (99th percentile)
- Average latency
Factors Affecting Latency:
- Network round-trip time
- Database query time
- Processing time
- Serialization overhead
- Queue wait time
The Trade-off Explained
Why They Conflict
High Throughput Approach:
βββββββββββββββββββββββββββββββββββββββ
β Request 1 βββ β
β Request 2 βββ€ β
β Request 3 βββΌββ> Batch Process ββ>β High throughput
β Request 4 βββ€ (Wait for batch) β Higher latency
β Request 5 βββ β
βββββββββββββββββββββββββββββββββββββββ
Low Latency Approach:
βββββββββββββββββββββββββββββββββββββββ
β Request 1 ββ> Process ββ> Responseβ Low latency
β Request 2 ββ> Process ββ> Responseβ Lower throughput
β Request 3 ββ> Process ββ> Responseβ
βββββββββββββββββββββββββββββββββββββββ
Batching Strategy
How Batching Works:
- Collect multiple requests
- Wait for batch size or timeout
- Process batch together
- Return responses
Benefits:
- β Higher throughput (amortize overhead)
- β Better resource utilization
- β Reduced per-request overhead
- β Efficient database operations
Drawbacks:
- β Higher latency (wait for batch)
- β Memory overhead (buffering)
- β Complexity (batch management)
Example: Database Writes
# Low throughput, low latency
for item in items:
db.insert(item) # Each insert = network round-trip
# High throughput, higher latency
batch = []
for item in items:
batch.append(item)
if len(batch) >= 100:
db.bulk_insert(batch) # Batch = 1 round-trip, but wait for 100
batch = []
Real-time Processing Strategy
How Real-time Works:
- Process each request immediately
- No batching or waiting
- Optimize for individual request speed
Benefits:
- β Lower latency (no waiting)
- β Better user experience
- β Immediate feedback
- β Simpler logic
Drawbacks:
- β Lower throughput (more overhead)
- β Higher resource usage
- β More database connections
- β Less efficient resource utilization
Decision Framework
Step 1: Identify Requirements
High Throughput Needed:
- Analytics processing
- Log aggregation
- Bulk data processing
- Background jobs
- ETL pipelines
Low Latency Needed:
- User-facing APIs
- Real-time interactions
- Gaming applications
- Trading systems
- Chat applications
Step 2: Analyze Request Patterns
Questions to Ask:
- What's the acceptable latency?
- What's the required throughput?
- Are requests independent or can they be batched?
- Is user experience or efficiency more important?
Step 3: Choose Strategy
Pure Batching:
- Use when: High volume, latency not critical
- Example: Log processing, analytics
Pure Real-time:
- Use when: Low latency critical, volume manageable
- Example: User API, real-time chat
Hybrid Approach (Recommended):
- Critical requests: Real-time
- Non-critical requests: Batch
- Example: E-commerce (orders real-time, analytics batched)
Real-World Examples
Example 1: Payment Processing
Throughput Priority:
- Batch process transactions at end of day
- Higher throughput, acceptable latency
- Use for: Settlement, reconciliation
Latency Priority:
- Process payments immediately
- Lower latency, acceptable throughput
- Use for: Real-time payments, card transactions
Example 2: Analytics Platform
Throughput Priority:
- Batch process events every 5 minutes
- Process millions of events efficiently
- Use for: Business intelligence, reporting
Latency Priority:
- Process events in real-time
- Immediate dashboards and alerts
- Use for: Real-time monitoring, fraud detection
Example 3: Social Media Feed
Throughput Priority:
- Batch generate feeds periodically
- Efficient resource usage
- Use for: Background feed generation
Latency Priority:
- Generate feeds on-demand
- Immediate user experience
- Use for: Real-time feed updates
Optimization Techniques
For High Throughput
-
Batching
- Batch database writes
- Batch API calls
- Batch file operations
-
Connection Pooling
- Reuse connections
- Reduce connection overhead
- Efficient resource usage
-
Asynchronous Processing
- Non-blocking I/O
- Event-driven architecture
- Parallel processing
-
Caching
- Cache frequently accessed data
- Reduce database load
- Increase effective throughput
For Low Latency
-
Pre-computation
- Pre-calculate results
- Cache common queries
- Warm-up caches
-
Optimize Critical Path
- Minimize database queries
- Reduce network hops
- Optimize serialization
-
CDN and Edge Computing
- Serve from edge locations
- Reduce network latency
- Geographic distribution
-
Database Optimization
- Index optimization
- Query optimization
- Read replicas
Hybrid Approaches
Pattern 1: Tiered Processing
βββββββββββββββββββββββββββββββββββββββ
β Request Classification β
βββββββββββββββββββββββββββββββββββββββ€
β Critical (Real-time): β
β - User actions β
β - Payments β
β - Authentication β
βββββββββββββββββββββββββββββββββββββββ€
β Non-Critical (Batch): β
β - Analytics β
β - Logs β
β - Recommendations β
βββββββββββββββββββββββββββββββββββββββ
Pattern 2: Adaptive Batching
- Start with small batches
- Increase batch size if throughput needed
- Decrease batch size if latency critical
- Dynamic adjustment based on load
Pattern 3: Write-Through + Write-Behind
- Write critical data immediately (low latency)
- Batch non-critical writes (high throughput)
- Combine both strategies
Monitoring and Metrics
Key Metrics to Track
Throughput Metrics:
- Requests per second
- Transactions per second
- Batch processing rate
- Queue depth
Latency Metrics:
- P50, P95, P99 latencies
- End-to-end latency
- Processing time
- Queue wait time
Trade-off Metrics:
- Throughput vs latency graph
- Batch size vs latency
- Resource utilization
- Error rates
Alerting
- Alert if latency exceeds SLA
- Alert if throughput drops
- Alert on batch processing delays
- Monitor queue growth
Best Practices
1. Understand Your SLA
- Define acceptable latency
- Define required throughput
- Design to meet both
2. Use Hybrid Approaches
- Real-time for critical paths
- Batching for non-critical
- Best of both worlds
3. Monitor Both Metrics
- Don't optimize one at expense of other
- Track trade-off curves
- Adjust based on metrics
4. Optimize Incrementally
- Start with simple approach
- Measure baseline
- Optimize based on data
- Avoid premature optimization
5. Consider User Experience
- User-facing: Prioritize latency
- Background: Prioritize throughput
- Balance based on impact
Quick Reference Summary
Throughput: Process more requests per second. Achieved through batching, connection pooling, parallel processing.
Latency: Lower response time per request. Achieved through real-time processing, optimization, caching.
Key Trade-off: Batching increases throughput but increases latency. Real-time processing reduces latency but may reduce throughput.
Best Practice: Use hybrid approach - real-time for critical operations, batching for non-critical operations.
Remember: Most systems need both. The key is finding the right balance for your use case.
Previous Topic: Memory vs Latency β
Next Topic: Accuracy vs Latency β
Back to: Step 11 Overview | Main Index