Throughput vs Latency

Quick Reference: Memory vs Latency | Accuracy vs Latency

Quick Reference

Throughput: Process more requests per second

Latency: Lower response time per request

Trade-off: Batching increases throughput but increases latency

Clear Definition

Throughput vs Latency is a fundamental performance trade-off in system design. Batching and optimization techniques increase throughput (requests per second) but may increase latency (time per request). Real-time processing reduces latency but may reduce overall throughput.

💡 Key Insight: Batch for high throughput, process immediately for low latency. Most systems use a hybrid approach - batch non-critical operations, process critical ones in real-time.

Core Concepts

Throughput (Requests Per Second)

Definition: The number of requests a system can process in a given time period.

Metrics:

Requests per second (RPS)
Transactions per second (TPS)
Operations per second (OPS)

Factors Affecting Throughput:

Batching operations
Connection pooling
Parallel processing
Resource utilization
Network bandwidth

Latency (Response Time)

Definition: The time taken to process a single request from start to finish.

Metrics:

P50 (median latency)
P95 (95th percentile)
P99 (99th percentile)
Average latency

Factors Affecting Latency:

Network round-trip time
Database query time
Processing time
Serialization overhead
Queue wait time

The Trade-off Explained

Why They Conflict

High Throughput Approach:
┌─────────────────────────────────────┐
│  Request 1 ──┐                      │
│  Request 2 ──┤                      │
│  Request 3 ──┼──> Batch Process ──>│  High throughput
│  Request 4 ──┤     (Wait for batch) │  Higher latency
│  Request 5 ──┘                      │
└─────────────────────────────────────┘

Low Latency Approach:
┌─────────────────────────────────────┐
│  Request 1 ──> Process ──> Response│  Low latency
│  Request 2 ──> Process ──> Response│  Lower throughput
│  Request 3 ──> Process ──> Response│
└─────────────────────────────────────┘

Batching Strategy

How Batching Works:

Collect multiple requests
Wait for batch size or timeout
Process batch together
Return responses

Benefits:

✅ Higher throughput (amortize overhead)
✅ Better resource utilization
✅ Reduced per-request overhead
✅ Efficient database operations

Drawbacks:

❌ Higher latency (wait for batch)
❌ Memory overhead (buffering)
❌ Complexity (batch management)

Example: Database Writes

# Low throughput, low latency
for item in items:
    db.insert(item)  # Each insert = network round-trip

# High throughput, higher latency
batch = []
for item in items:
    batch.append(item)
    if len(batch) >= 100:
        db.bulk_insert(batch)  # Batch = 1 round-trip, but wait for 100
        batch = []

Real-time Processing Strategy

How Real-time Works:

Process each request immediately
No batching or waiting
Optimize for individual request speed

Benefits:

✅ Lower latency (no waiting)
✅ Better user experience
✅ Immediate feedback
✅ Simpler logic

Drawbacks:

❌ Lower throughput (more overhead)
❌ Higher resource usage
❌ More database connections
❌ Less efficient resource utilization

Decision Framework

Step 1: Identify Requirements

High Throughput Needed:

Analytics processing
Log aggregation
Bulk data processing
Background jobs
ETL pipelines

Low Latency Needed:

User-facing APIs
Real-time interactions
Gaming applications
Trading systems
Chat applications

Step 2: Analyze Request Patterns

Questions to Ask:

What's the acceptable latency?
What's the required throughput?
Are requests independent or can they be batched?
Is user experience or efficiency more important?

Step 3: Choose Strategy

Pure Batching:

Use when: High volume, latency not critical
Example: Log processing, analytics

Pure Real-time:

Use when: Low latency critical, volume manageable
Example: User API, real-time chat

Hybrid Approach (Recommended):

Critical requests: Real-time
Non-critical requests: Batch
Example: E-commerce (orders real-time, analytics batched)

Real-World Examples

Example 1: Payment Processing

Throughput Priority:

Batch process transactions at end of day
Higher throughput, acceptable latency
Use for: Settlement, reconciliation

Latency Priority:

Process payments immediately
Lower latency, acceptable throughput
Use for: Real-time payments, card transactions

Example 2: Analytics Platform

Throughput Priority:

Batch process events every 5 minutes
Process millions of events efficiently
Use for: Business intelligence, reporting

Latency Priority:

Process events in real-time
Immediate dashboards and alerts
Use for: Real-time monitoring, fraud detection

Throughput Priority:

Batch generate feeds periodically
Efficient resource usage
Use for: Background feed generation

Latency Priority:

Generate feeds on-demand
Immediate user experience
Use for: Real-time feed updates

Optimization Techniques

For High Throughput

Batching
- Batch database writes
- Batch API calls
- Batch file operations
Connection Pooling
- Reuse connections
- Reduce connection overhead
- Efficient resource usage
Asynchronous Processing
- Non-blocking I/O
- Event-driven architecture
- Parallel processing
Caching
- Cache frequently accessed data
- Reduce database load
- Increase effective throughput

For Low Latency

Pre-computation
- Pre-calculate results
- Cache common queries
- Warm-up caches
Optimize Critical Path
- Minimize database queries
- Reduce network hops
- Optimize serialization
CDN and Edge Computing
- Serve from edge locations
- Reduce network latency
- Geographic distribution
Database Optimization
- Index optimization
- Query optimization
- Read replicas

Hybrid Approaches

Pattern 1: Tiered Processing

┌─────────────────────────────────────┐
│  Request Classification             │
├─────────────────────────────────────┤
│  Critical (Real-time):              │
│  - User actions                     │
│  - Payments                         │
│  - Authentication                    │
├─────────────────────────────────────┤
│  Non-Critical (Batch):              │
│  - Analytics                        │
│  - Logs                             │
│  - Recommendations                  │
└─────────────────────────────────────┘

Pattern 2: Adaptive Batching

Start with small batches
Increase batch size if throughput needed
Decrease batch size if latency critical
Dynamic adjustment based on load

Pattern 3: Write-Through + Write-Behind

Write critical data immediately (low latency)
Batch non-critical writes (high throughput)
Combine both strategies

Monitoring and Metrics

Key Metrics to Track

Throughput Metrics:

Requests per second
Transactions per second
Batch processing rate
Queue depth

Latency Metrics:

P50, P95, P99 latencies
End-to-end latency
Processing time
Queue wait time

Trade-off Metrics:

Throughput vs latency graph
Batch size vs latency
Resource utilization
Error rates

Alerting

Alert if latency exceeds SLA
Alert if throughput drops
Alert on batch processing delays
Monitor queue growth

Best Practices

1. Understand Your SLA

Define acceptable latency
Define required throughput
Design to meet both

2. Use Hybrid Approaches

Real-time for critical paths
Batching for non-critical
Best of both worlds

3. Monitor Both Metrics

Don't optimize one at expense of other
Track trade-off curves
Adjust based on metrics

4. Optimize Incrementally

Start with simple approach
Measure baseline
Optimize based on data
Avoid premature optimization

5. Consider User Experience

User-facing: Prioritize latency
Background: Prioritize throughput
Balance based on impact

Quick Reference Summary

Throughput: Process more requests per second. Achieved through batching, connection pooling, parallel processing.

Latency: Lower response time per request. Achieved through real-time processing, optimization, caching.

Key Trade-off: Batching increases throughput but increases latency. Real-time processing reduces latency but may reduce throughput.

Best Practice: Use hybrid approach - real-time for critical operations, batching for non-critical operations.

Remember: Most systems need both. The key is finding the right balance for your use case.

Previous Topic: Memory vs Latency ←

Next Topic: Accuracy vs Latency →

Back to: Step 11 Overview | Main Index

# Throughput vs Latency

# Quick Reference

# Clear Definition

# Core Concepts

# Throughput (Requests Per Second)

# Latency (Response Time)

# The Trade-off Explained

# Why They Conflict

# Batching Strategy

# Real-time Processing Strategy

# Decision Framework

# Step 1: Identify Requirements

# Step 2: Analyze Request Patterns

# Step 3: Choose Strategy

# Real-World Examples

# Example 1: Payment Processing

# Example 2: Analytics Platform

# Example 3: Social Media Feed

# Optimization Techniques

# For High Throughput

# For Low Latency

# Hybrid Approaches

# Pattern 1: Tiered Processing

# Pattern 2: Adaptive Batching

# Pattern 3: Write-Through + Write-Behind

# Monitoring and Metrics

# Key Metrics to Track

# Alerting

# Best Practices

# 1. Understand Your SLA

# 2. Use Hybrid Approaches

# 3. Monitor Both Metrics

# 4. Optimize Incrementally

# 5. Consider User Experience

# Quick Reference Summary

Throughput vs Latency

Quick Reference

Clear Definition

Core Concepts

Throughput (Requests Per Second)

Latency (Response Time)

The Trade-off Explained

Why They Conflict

Batching Strategy

Real-time Processing Strategy

Decision Framework

Step 1: Identify Requirements

Step 2: Analyze Request Patterns

Step 3: Choose Strategy

Real-World Examples

Example 1: Payment Processing

Example 2: Analytics Platform

Example 3: Social Media Feed

Optimization Techniques

For High Throughput

For Low Latency

Hybrid Approaches

Pattern 1: Tiered Processing

Pattern 2: Adaptive Batching

Pattern 3: Write-Through + Write-Behind

Monitoring and Metrics

Key Metrics to Track

Alerting

Best Practices

1. Understand Your SLA

2. Use Hybrid Approaches

3. Monitor Both Metrics

4. Optimize Incrementally

5. Consider User Experience

Quick Reference Summary