Netflix System Design

Quick Reference: YouTube | Step 4: CDN


Quick Reference

Scale: 200M+ subscribers, 15% of internet traffic, 100+ countries

Key Components: Video storage, CDN (Open Connect), encoding, recommendations, microservices

Challenges: Video delivery at scale, CDN optimization, personalized recommendations, global distribution


Clear Definition

Netflix is a global video streaming platform serving 200M+ subscribers worldwide. It requires efficient video storage, global content delivery through CDN, multiple video encoding formats, and sophisticated recommendation systems to provide personalized content to users.

πŸ’‘ Key Insight: Netflix uses its own Open Connect CDN (content delivery network) with servers placed at ISPs, multiple video encoding formats for adaptive streaming, and machine learning algorithms for personalized recommendations.


System Requirements

Functional Requirements

  1. Video Streaming

    • Stream videos in multiple qualities (SD, HD, 4K)
    • Support adaptive bitrate streaming
    • Resume playback from any point
    • Support multiple devices (TV, mobile, tablet, web)
  2. Content Management

    • Upload and process videos
    • Store video metadata
    • Manage content library
    • Handle regional content restrictions
  3. User Management

    • User registration and authentication
    • Multiple user profiles per account
    • Watch history and preferences
    • Parental controls
  4. Recommendations

    • Personalized content recommendations
    • Continue watching
    • Trending content
    • Similar content suggestions

Non-Functional Requirements

  1. Scale

    • 200M+ subscribers globally
    • 15% of internet traffic during peak hours
    • Support concurrent streaming for millions
  2. Performance

    • Low latency video start (< 2 seconds)
    • Smooth playback (no buffering)
    • Fast search and recommendations
  3. Availability

    • 99.99% uptime
    • Global availability
    • Handle regional outages
  4. Reliability

    • No data loss
    • Consistent playback experience
    • Handle device failures

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Client Applications                       β”‚
β”‚  (Web, Mobile, TV, Gaming Consoles)                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β”‚ HTTPS
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    API Gateway / Load Balancer              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚               β”‚               β”‚
        β–Ό               β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User       β”‚ β”‚   Content     β”‚ β”‚   Playback   β”‚
β”‚   Service    β”‚ β”‚   Service    β”‚ β”‚   Service    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                β”‚                 β”‚
       β”‚                β”‚                 β”‚
       β–Ό                β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Recommendation Service                   β”‚
β”‚              (ML-based, Real-time)                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Open Connect CDN                         β”‚
β”‚  (Servers at ISPs, Edge Locations, Regional Data Centers)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Video Storage (S3-like)                      β”‚
β”‚         (Encoded Videos, Multiple Formats)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

1. Client Applications

Responsibilities:

  • Video playback
  • User interface
  • Adaptive bitrate selection
  • Offline downloads

Technologies:

  • Web: HTML5, JavaScript
  • Mobile: Native apps (iOS, Android)
  • TV: Smart TV apps, set-top boxes
  • Gaming: Console apps

Key Features:

  • Adaptive Streaming: Automatically adjusts quality based on bandwidth
  • Offline Downloads: Download content for offline viewing
  • Multiple Profiles: Support multiple user profiles per account

2. API Gateway / Load Balancer

Responsibilities:

  • Route requests to appropriate services
  • Authentication and authorization
  • Rate limiting
  • Request/response transformation

Technologies:

  • AWS API Gateway or custom solution
  • Load balancers (AWS ELB, NGINX)

Key Features:

  • Geographic Routing: Route to nearest data center
  • Authentication: Verify user credentials
  • Rate Limiting: Prevent abuse

3. Microservices Architecture

Netflix uses a microservices architecture with hundreds of services:

User Service:

  • User registration and authentication
  • Profile management
  • Subscription management
  • Payment processing

Content Service:

  • Content metadata
  • Content library management
  • Search functionality
  • Content categorization

Playback Service:

  • Video URL generation
  • Playback session management
  • Resume functionality
  • Watch history tracking

Recommendation Service:

  • Personalized recommendations
  • Content ranking
  • Similar content suggestions
  • Trending content

Analytics Service:

  • User behavior tracking
  • Content performance metrics
  • A/B testing
  • Business intelligence

4. Open Connect CDN

What is Open Connect?

  • Netflix's own content delivery network
  • Servers placed at ISPs and internet exchanges
  • Reduces internet backbone traffic
  • Improves streaming quality

Architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Netflix Origin Servers                β”‚
β”‚  (Primary video storage)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ Replication
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Open Connect Appliances (OCAs)         β”‚
β”‚  - At ISPs                              β”‚
β”‚  - At Internet Exchanges                β”‚
β”‚  - Regional Data Centers                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ Local Delivery
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  End Users                              β”‚
β”‚  (Streaming from nearest OCA)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benefits:

  • Low Latency: Content served from ISP, not internet backbone
  • High Bandwidth: Direct connection to ISP network
  • Cost Effective: Reduces bandwidth costs
  • Better Quality: Consistent streaming experience

How it Works:

  1. Netflix pre-populates popular content on OCAs
  2. Users stream from nearest OCA
  3. If content not on OCA, fetched from origin
  4. Popular content cached for future requests

5. Video Encoding and Storage

Encoding Pipeline:

Raw Video
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Video Ingestionβ”‚
β”‚  (Upload)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Video Encoding β”‚
β”‚  - Multiple      β”‚
β”‚    resolutions   β”‚
β”‚  - Multiple      β”‚
β”‚    bitrates      β”‚
β”‚  - Multiple      β”‚
β”‚    codecs        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Video Storage   β”‚
β”‚  (S3-like)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Encoding Formats:

  • Resolutions: 480p, 720p, 1080p, 4K
  • Bitrates: Multiple bitrates per resolution
  • Codecs: H.264, VP9, AV1 (for newer content)
  • Adaptive Streaming: HLS (HTTP Live Streaming) or DASH

Storage:

  • Object Storage: S3-like storage (AWS S3, Google Cloud Storage)
  • Replication: Multiple copies for redundancy
  • Geographic Distribution: Content stored in multiple regions

6. Recommendation System

How it Works:

  1. Data Collection: Track user behavior (watches, ratings, searches)
  2. Feature Engineering: Extract features from content and users
  3. Model Training: Train ML models on historical data
  4. Real-time Inference: Generate recommendations in real-time
  5. A/B Testing: Continuously test and improve

Algorithms:

  • Collaborative Filtering: Find similar users/content
  • Content-Based: Recommend based on content features
  • Deep Learning: Neural networks for complex patterns
  • Hybrid: Combine multiple approaches

Key Features:

  • Personalization: Recommendations per user profile
  • Real-time Updates: Update based on recent activity
  • Diversity: Show variety in recommendations
  • Exploration: Balance popular vs niche content

Data Flow

Video Playback Flow

1. User opens Netflix app
   β”‚
2. App requests content list from API
   β”‚
3. API returns personalized recommendations
   β”‚
4. User selects video
   β”‚
5. App requests playback URL
   β”‚
6. Playback service generates URL (points to CDN)
   β”‚
7. App starts streaming from Open Connect CDN
   β”‚
8. CDN serves video chunks
   β”‚
9. App adapts quality based on bandwidth
   β”‚
10. Watch history updated in real-time

Recommendation Flow

1. User watches content
   β”‚
2. Behavior tracked (watch time, completion, rating)
   β”‚
3. Data sent to analytics service
   β”‚
4. Recommendation service processes data
   β”‚
5. ML models generate new recommendations
   β”‚
6. Recommendations cached for user
   β”‚
7. Next time user opens app, sees updated recommendations

Scaling Strategies

1. Horizontal Scaling

Microservices:

  • Each service scales independently
  • Auto-scaling based on load
  • Stateless services for easy scaling

CDN:

  • Add more Open Connect appliances
  • Distribute across more ISPs
  • Scale based on regional demand

2. Caching

Content Caching:

  • Cache popular content on CDN
  • Pre-populate OCAs with trending content
  • Cache metadata and recommendations

Application Caching:

  • Cache user sessions
  • Cache recommendations
  • Cache content metadata

3. Database Sharding

User Data:

  • Shard by user ID
  • Distribute across multiple databases
  • Handle cross-shard queries

Content Data:

  • Shard by content ID or region
  • Replicate for read scalability
  • Use read replicas

4. Geographic Distribution

Data Centers:

  • Multiple regions (US, EU, Asia, etc.)
  • Route users to nearest region
  • Replicate critical data across regions

CDN:

  • OCAs in every major region
  • Route to nearest OCA
  • Handle regional outages

Key Design Decisions

1. Open Connect CDN

Decision: Build own CDN instead of using third-party

Rationale:

  • Better control over quality
  • Cost effective at scale
  • Direct ISP relationships
  • Better user experience

Trade-offs:

  • βœ… Better quality
  • βœ… Lower costs at scale
  • βœ… Better control
  • ❌ Higher initial investment
  • ❌ More operational complexity

2. Microservices Architecture

Decision: Use microservices instead of monolith

Rationale:

  • Independent scaling
  • Technology diversity
  • Team autonomy
  • Fault isolation

Trade-offs:

  • βœ… Better scalability
  • βœ… Technology flexibility
  • βœ… Team autonomy
  • ❌ Higher complexity
  • ❌ Network overhead
  • ❌ Distributed system challenges

3. Adaptive Streaming

Decision: Use adaptive bitrate streaming

Rationale:

  • Handle varying bandwidth
  • Better user experience
  • Reduce buffering
  • Optimize bandwidth usage

Trade-offs:

  • βœ… Better user experience
  • βœ… Handles network variations
  • βœ… Optimizes bandwidth
  • ❌ More encoding formats needed
  • ❌ More storage required

4. Multiple Encoding Formats

Decision: Encode videos in multiple formats/resolutions

Rationale:

  • Support different devices
  • Handle varying bandwidth
  • Future-proof (new codecs)
  • Optimize storage and delivery

Trade-offs:

  • βœ… Better device support
  • βœ… Bandwidth optimization
  • βœ… Future compatibility
  • ❌ Higher encoding costs
  • ❌ More storage required

Challenges and Solutions

Challenge 1: Global Scale

Problem: Serve 200M+ users across 100+ countries

Solution:

  • Open Connect CDN with global distribution
  • Regional data centers
  • Geographic routing
  • Content replication

Challenge 2: Video Delivery

Problem: Deliver high-quality video with low latency

Solution:

  • Open Connect CDN (servers at ISPs)
  • Adaptive bitrate streaming
  • Multiple encoding formats
  • Pre-population of popular content

Challenge 3: Personalization

Problem: Provide personalized recommendations to 200M+ users

Solution:

  • ML-based recommendation system
  • Real-time behavior tracking
  • Distributed model serving
  • A/B testing for continuous improvement

Challenge 4: Cost Optimization

Problem: High infrastructure costs at scale

Solution:

  • Open Connect CDN (reduces bandwidth costs)
  • Efficient encoding (reduce storage)
  • Caching (reduce compute)
  • Auto-scaling (pay for what you use)

Monitoring and Observability

Key Metrics

Performance Metrics:

  • Video start time
  • Buffering rate
  • Playback quality
  • Error rate

Business Metrics:

  • Subscriber count
  • Watch time
  • Content completion rate
  • Churn rate

Infrastructure Metrics:

  • CDN hit rate
  • Server utilization
  • Network bandwidth
  • Storage usage

Alerting

  • Alert on high error rates
  • Alert on CDN issues
  • Alert on recommendation service failures
  • Alert on high latency

Best Practices

1. CDN Optimization

  • Pre-populate popular content
  • Cache at edge locations
  • Monitor CDN performance
  • Optimize cache hit rates

2. Encoding Strategy

  • Use efficient codecs (AV1, VP9)
  • Encode in multiple formats
  • Optimize for quality vs size
  • Regular re-encoding for new codecs

3. Recommendation System

  • Continuously improve models
  • A/B test new algorithms
  • Monitor recommendation quality
  • Balance exploration vs exploitation

4. Monitoring

  • Monitor end-to-end user experience
  • Track key business metrics
  • Alert on critical issues
  • Use distributed tracing

Quick Reference Summary

Netflix: Global video streaming platform serving 200M+ subscribers.

Key Components:

  • Open Connect CDN (servers at ISPs)
  • Microservices architecture
  • Video encoding pipeline
  • ML-based recommendation system

Key Design Decisions:

  • Own CDN (Open Connect) for better control
  • Microservices for scalability
  • Adaptive streaming for varying bandwidth
  • Multiple encoding formats for device support

Scaling Strategies:

  • Horizontal scaling of microservices
  • Global CDN distribution
  • Database sharding
  • Geographic distribution

Remember: Netflix's success comes from combining excellent content delivery (Open Connect CDN) with personalized recommendations (ML) and a scalable microservices architecture.


Previous Topic: Google Drive ←

Next Topic: Instagram β†’

Back to: Step 12 Overview | Main Index