Throughput vs Latency

Quick Reference: Memory vs Latency | Accuracy vs Latency


Quick Reference

Throughput: Process more requests per second

Latency: Lower response time per request

Trade-off: Batching increases throughput but increases latency


Clear Definition

Throughput vs Latency is a fundamental performance trade-off in system design. Batching and optimization techniques increase throughput (requests per second) but may increase latency (time per request). Real-time processing reduces latency but may reduce overall throughput.

πŸ’‘ Key Insight: Batch for high throughput, process immediately for low latency. Most systems use a hybrid approach - batch non-critical operations, process critical ones in real-time.


Core Concepts

Throughput (Requests Per Second)

Definition: The number of requests a system can process in a given time period.

Metrics:

  • Requests per second (RPS)
  • Transactions per second (TPS)
  • Operations per second (OPS)

Factors Affecting Throughput:

  • Batching operations
  • Connection pooling
  • Parallel processing
  • Resource utilization
  • Network bandwidth

Latency (Response Time)

Definition: The time taken to process a single request from start to finish.

Metrics:

  • P50 (median latency)
  • P95 (95th percentile)
  • P99 (99th percentile)
  • Average latency

Factors Affecting Latency:

  • Network round-trip time
  • Database query time
  • Processing time
  • Serialization overhead
  • Queue wait time

The Trade-off Explained

Why They Conflict

High Throughput Approach:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Request 1 ──┐                      β”‚
β”‚  Request 2 ───                      β”‚
β”‚  Request 3 ──┼──> Batch Process ──>β”‚  High throughput
β”‚  Request 4 ───     (Wait for batch) β”‚  Higher latency
β”‚  Request 5 β”€β”€β”˜                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Low Latency Approach:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Request 1 ──> Process ──> Responseβ”‚  Low latency
β”‚  Request 2 ──> Process ──> Responseβ”‚  Lower throughput
β”‚  Request 3 ──> Process ──> Responseβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Batching Strategy

How Batching Works:

  1. Collect multiple requests
  2. Wait for batch size or timeout
  3. Process batch together
  4. Return responses

Benefits:

  • βœ… Higher throughput (amortize overhead)
  • βœ… Better resource utilization
  • βœ… Reduced per-request overhead
  • βœ… Efficient database operations

Drawbacks:

  • ❌ Higher latency (wait for batch)
  • ❌ Memory overhead (buffering)
  • ❌ Complexity (batch management)

Example: Database Writes

# Low throughput, low latency
for item in items:
    db.insert(item)  # Each insert = network round-trip

# High throughput, higher latency
batch = []
for item in items:
    batch.append(item)
    if len(batch) >= 100:
        db.bulk_insert(batch)  # Batch = 1 round-trip, but wait for 100
        batch = []

Real-time Processing Strategy

How Real-time Works:

  1. Process each request immediately
  2. No batching or waiting
  3. Optimize for individual request speed

Benefits:

  • βœ… Lower latency (no waiting)
  • βœ… Better user experience
  • βœ… Immediate feedback
  • βœ… Simpler logic

Drawbacks:

  • ❌ Lower throughput (more overhead)
  • ❌ Higher resource usage
  • ❌ More database connections
  • ❌ Less efficient resource utilization

Decision Framework

Step 1: Identify Requirements

High Throughput Needed:

  • Analytics processing
  • Log aggregation
  • Bulk data processing
  • Background jobs
  • ETL pipelines

Low Latency Needed:

  • User-facing APIs
  • Real-time interactions
  • Gaming applications
  • Trading systems
  • Chat applications

Step 2: Analyze Request Patterns

Questions to Ask:

  • What's the acceptable latency?
  • What's the required throughput?
  • Are requests independent or can they be batched?
  • Is user experience or efficiency more important?

Step 3: Choose Strategy

Pure Batching:

  • Use when: High volume, latency not critical
  • Example: Log processing, analytics

Pure Real-time:

  • Use when: Low latency critical, volume manageable
  • Example: User API, real-time chat

Hybrid Approach (Recommended):

  • Critical requests: Real-time
  • Non-critical requests: Batch
  • Example: E-commerce (orders real-time, analytics batched)

Real-World Examples

Example 1: Payment Processing

Throughput Priority:

  • Batch process transactions at end of day
  • Higher throughput, acceptable latency
  • Use for: Settlement, reconciliation

Latency Priority:

  • Process payments immediately
  • Lower latency, acceptable throughput
  • Use for: Real-time payments, card transactions

Example 2: Analytics Platform

Throughput Priority:

  • Batch process events every 5 minutes
  • Process millions of events efficiently
  • Use for: Business intelligence, reporting

Latency Priority:

  • Process events in real-time
  • Immediate dashboards and alerts
  • Use for: Real-time monitoring, fraud detection

Example 3: Social Media Feed

Throughput Priority:

  • Batch generate feeds periodically
  • Efficient resource usage
  • Use for: Background feed generation

Latency Priority:

  • Generate feeds on-demand
  • Immediate user experience
  • Use for: Real-time feed updates

Optimization Techniques

For High Throughput

  1. Batching

    • Batch database writes
    • Batch API calls
    • Batch file operations
  2. Connection Pooling

    • Reuse connections
    • Reduce connection overhead
    • Efficient resource usage
  3. Asynchronous Processing

    • Non-blocking I/O
    • Event-driven architecture
    • Parallel processing
  4. Caching

    • Cache frequently accessed data
    • Reduce database load
    • Increase effective throughput

For Low Latency

  1. Pre-computation

    • Pre-calculate results
    • Cache common queries
    • Warm-up caches
  2. Optimize Critical Path

    • Minimize database queries
    • Reduce network hops
    • Optimize serialization
  3. CDN and Edge Computing

    • Serve from edge locations
    • Reduce network latency
    • Geographic distribution
  4. Database Optimization

    • Index optimization
    • Query optimization
    • Read replicas

Hybrid Approaches

Pattern 1: Tiered Processing

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Request Classification             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Critical (Real-time):              β”‚
β”‚  - User actions                     β”‚
β”‚  - Payments                         β”‚
β”‚  - Authentication                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Non-Critical (Batch):              β”‚
β”‚  - Analytics                        β”‚
β”‚  - Logs                             β”‚
β”‚  - Recommendations                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pattern 2: Adaptive Batching

  • Start with small batches
  • Increase batch size if throughput needed
  • Decrease batch size if latency critical
  • Dynamic adjustment based on load

Pattern 3: Write-Through + Write-Behind

  • Write critical data immediately (low latency)
  • Batch non-critical writes (high throughput)
  • Combine both strategies

Monitoring and Metrics

Key Metrics to Track

Throughput Metrics:

  • Requests per second
  • Transactions per second
  • Batch processing rate
  • Queue depth

Latency Metrics:

  • P50, P95, P99 latencies
  • End-to-end latency
  • Processing time
  • Queue wait time

Trade-off Metrics:

  • Throughput vs latency graph
  • Batch size vs latency
  • Resource utilization
  • Error rates

Alerting

  • Alert if latency exceeds SLA
  • Alert if throughput drops
  • Alert on batch processing delays
  • Monitor queue growth

Best Practices

1. Understand Your SLA

  • Define acceptable latency
  • Define required throughput
  • Design to meet both

2. Use Hybrid Approaches

  • Real-time for critical paths
  • Batching for non-critical
  • Best of both worlds

3. Monitor Both Metrics

  • Don't optimize one at expense of other
  • Track trade-off curves
  • Adjust based on metrics

4. Optimize Incrementally

  • Start with simple approach
  • Measure baseline
  • Optimize based on data
  • Avoid premature optimization

5. Consider User Experience

  • User-facing: Prioritize latency
  • Background: Prioritize throughput
  • Balance based on impact

Quick Reference Summary

Throughput: Process more requests per second. Achieved through batching, connection pooling, parallel processing.

Latency: Lower response time per request. Achieved through real-time processing, optimization, caching.

Key Trade-off: Batching increases throughput but increases latency. Real-time processing reduces latency but may reduce throughput.

Best Practice: Use hybrid approach - real-time for critical operations, batching for non-critical operations.

Remember: Most systems need both. The key is finding the right balance for your use case.


Previous Topic: Memory vs Latency ←

Next Topic: Accuracy vs Latency β†’

Back to: Step 11 Overview | Main Index