Netflix System Design
Quick Reference: YouTube | Step 4: CDN
Quick Reference
Scale: 200M+ subscribers, 15% of internet traffic, 100+ countries
Key Components: Video storage, CDN (Open Connect), encoding, recommendations, microservices
Challenges: Video delivery at scale, CDN optimization, personalized recommendations, global distribution
Clear Definition
Netflix is a global video streaming platform serving 200M+ subscribers worldwide. It requires efficient video storage, global content delivery through CDN, multiple video encoding formats, and sophisticated recommendation systems to provide personalized content to users.
π‘ Key Insight: Netflix uses its own Open Connect CDN (content delivery network) with servers placed at ISPs, multiple video encoding formats for adaptive streaming, and machine learning algorithms for personalized recommendations.
System Requirements
Functional Requirements
-
Video Streaming
- Stream videos in multiple qualities (SD, HD, 4K)
- Support adaptive bitrate streaming
- Resume playback from any point
- Support multiple devices (TV, mobile, tablet, web)
-
Content Management
- Upload and process videos
- Store video metadata
- Manage content library
- Handle regional content restrictions
-
User Management
- User registration and authentication
- Multiple user profiles per account
- Watch history and preferences
- Parental controls
-
Recommendations
- Personalized content recommendations
- Continue watching
- Trending content
- Similar content suggestions
Non-Functional Requirements
-
Scale
- 200M+ subscribers globally
- 15% of internet traffic during peak hours
- Support concurrent streaming for millions
-
Performance
- Low latency video start (< 2 seconds)
- Smooth playback (no buffering)
- Fast search and recommendations
-
Availability
- 99.99% uptime
- Global availability
- Handle regional outages
-
Reliability
- No data loss
- Consistent playback experience
- Handle device failures
High-Level Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Applications β
β (Web, Mobile, TV, Gaming Consoles) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
β HTTPS
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway / Load Balancer β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β User β β Content β β Playback β
β Service β β Service β β Service β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Recommendation Service β
β (ML-based, Real-time) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Open Connect CDN β
β (Servers at ISPs, Edge Locations, Regional Data Centers) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Video Storage (S3-like) β
β (Encoded Videos, Multiple Formats) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
1. Client Applications
Responsibilities:
- Video playback
- User interface
- Adaptive bitrate selection
- Offline downloads
Technologies:
- Web: HTML5, JavaScript
- Mobile: Native apps (iOS, Android)
- TV: Smart TV apps, set-top boxes
- Gaming: Console apps
Key Features:
- Adaptive Streaming: Automatically adjusts quality based on bandwidth
- Offline Downloads: Download content for offline viewing
- Multiple Profiles: Support multiple user profiles per account
2. API Gateway / Load Balancer
Responsibilities:
- Route requests to appropriate services
- Authentication and authorization
- Rate limiting
- Request/response transformation
Technologies:
- AWS API Gateway or custom solution
- Load balancers (AWS ELB, NGINX)
Key Features:
- Geographic Routing: Route to nearest data center
- Authentication: Verify user credentials
- Rate Limiting: Prevent abuse
3. Microservices Architecture
Netflix uses a microservices architecture with hundreds of services:
User Service:
- User registration and authentication
- Profile management
- Subscription management
- Payment processing
Content Service:
- Content metadata
- Content library management
- Search functionality
- Content categorization
Playback Service:
- Video URL generation
- Playback session management
- Resume functionality
- Watch history tracking
Recommendation Service:
- Personalized recommendations
- Content ranking
- Similar content suggestions
- Trending content
Analytics Service:
- User behavior tracking
- Content performance metrics
- A/B testing
- Business intelligence
4. Open Connect CDN
What is Open Connect?
- Netflix's own content delivery network
- Servers placed at ISPs and internet exchanges
- Reduces internet backbone traffic
- Improves streaming quality
Architecture:
βββββββββββββββββββββββββββββββββββββββββββ
β Netflix Origin Servers β
β (Primary video storage) β
ββββββββββββββββ¬βββββββββββββββββββββββββββ
β
β Replication
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Open Connect Appliances (OCAs) β
β - At ISPs β
β - At Internet Exchanges β
β - Regional Data Centers β
ββββββββββββββββ¬βββββββββββββββββββββββββββ
β
β Local Delivery
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β End Users β
β (Streaming from nearest OCA) β
βββββββββββββββββββββββββββββββββββββββββββ
Benefits:
- Low Latency: Content served from ISP, not internet backbone
- High Bandwidth: Direct connection to ISP network
- Cost Effective: Reduces bandwidth costs
- Better Quality: Consistent streaming experience
How it Works:
- Netflix pre-populates popular content on OCAs
- Users stream from nearest OCA
- If content not on OCA, fetched from origin
- Popular content cached for future requests
5. Video Encoding and Storage
Encoding Pipeline:
Raw Video
β
βΌ
βββββββββββββββββββ
β Video Ingestionβ
β (Upload) β
ββββββββββ¬ββββββββββ
β
βΌ
βββββββββββββββββββ
β Video Encoding β
β - Multiple β
β resolutions β
β - Multiple β
β bitrates β
β - Multiple β
β codecs β
ββββββββββ¬ββββββββββ
β
βΌ
βββββββββββββββββββ
β Video Storage β
β (S3-like) β
ββββββββββββββββββββ
Encoding Formats:
- Resolutions: 480p, 720p, 1080p, 4K
- Bitrates: Multiple bitrates per resolution
- Codecs: H.264, VP9, AV1 (for newer content)
- Adaptive Streaming: HLS (HTTP Live Streaming) or DASH
Storage:
- Object Storage: S3-like storage (AWS S3, Google Cloud Storage)
- Replication: Multiple copies for redundancy
- Geographic Distribution: Content stored in multiple regions
6. Recommendation System
How it Works:
- Data Collection: Track user behavior (watches, ratings, searches)
- Feature Engineering: Extract features from content and users
- Model Training: Train ML models on historical data
- Real-time Inference: Generate recommendations in real-time
- A/B Testing: Continuously test and improve
Algorithms:
- Collaborative Filtering: Find similar users/content
- Content-Based: Recommend based on content features
- Deep Learning: Neural networks for complex patterns
- Hybrid: Combine multiple approaches
Key Features:
- Personalization: Recommendations per user profile
- Real-time Updates: Update based on recent activity
- Diversity: Show variety in recommendations
- Exploration: Balance popular vs niche content
Data Flow
Video Playback Flow
1. User opens Netflix app
β
2. App requests content list from API
β
3. API returns personalized recommendations
β
4. User selects video
β
5. App requests playback URL
β
6. Playback service generates URL (points to CDN)
β
7. App starts streaming from Open Connect CDN
β
8. CDN serves video chunks
β
9. App adapts quality based on bandwidth
β
10. Watch history updated in real-time
Recommendation Flow
1. User watches content
β
2. Behavior tracked (watch time, completion, rating)
β
3. Data sent to analytics service
β
4. Recommendation service processes data
β
5. ML models generate new recommendations
β
6. Recommendations cached for user
β
7. Next time user opens app, sees updated recommendations
Scaling Strategies
1. Horizontal Scaling
Microservices:
- Each service scales independently
- Auto-scaling based on load
- Stateless services for easy scaling
CDN:
- Add more Open Connect appliances
- Distribute across more ISPs
- Scale based on regional demand
2. Caching
Content Caching:
- Cache popular content on CDN
- Pre-populate OCAs with trending content
- Cache metadata and recommendations
Application Caching:
- Cache user sessions
- Cache recommendations
- Cache content metadata
3. Database Sharding
User Data:
- Shard by user ID
- Distribute across multiple databases
- Handle cross-shard queries
Content Data:
- Shard by content ID or region
- Replicate for read scalability
- Use read replicas
4. Geographic Distribution
Data Centers:
- Multiple regions (US, EU, Asia, etc.)
- Route users to nearest region
- Replicate critical data across regions
CDN:
- OCAs in every major region
- Route to nearest OCA
- Handle regional outages
Key Design Decisions
1. Open Connect CDN
Decision: Build own CDN instead of using third-party
Rationale:
- Better control over quality
- Cost effective at scale
- Direct ISP relationships
- Better user experience
Trade-offs:
- β Better quality
- β Lower costs at scale
- β Better control
- β Higher initial investment
- β More operational complexity
2. Microservices Architecture
Decision: Use microservices instead of monolith
Rationale:
- Independent scaling
- Technology diversity
- Team autonomy
- Fault isolation
Trade-offs:
- β Better scalability
- β Technology flexibility
- β Team autonomy
- β Higher complexity
- β Network overhead
- β Distributed system challenges
3. Adaptive Streaming
Decision: Use adaptive bitrate streaming
Rationale:
- Handle varying bandwidth
- Better user experience
- Reduce buffering
- Optimize bandwidth usage
Trade-offs:
- β Better user experience
- β Handles network variations
- β Optimizes bandwidth
- β More encoding formats needed
- β More storage required
4. Multiple Encoding Formats
Decision: Encode videos in multiple formats/resolutions
Rationale:
- Support different devices
- Handle varying bandwidth
- Future-proof (new codecs)
- Optimize storage and delivery
Trade-offs:
- β Better device support
- β Bandwidth optimization
- β Future compatibility
- β Higher encoding costs
- β More storage required
Challenges and Solutions
Challenge 1: Global Scale
Problem: Serve 200M+ users across 100+ countries
Solution:
- Open Connect CDN with global distribution
- Regional data centers
- Geographic routing
- Content replication
Challenge 2: Video Delivery
Problem: Deliver high-quality video with low latency
Solution:
- Open Connect CDN (servers at ISPs)
- Adaptive bitrate streaming
- Multiple encoding formats
- Pre-population of popular content
Challenge 3: Personalization
Problem: Provide personalized recommendations to 200M+ users
Solution:
- ML-based recommendation system
- Real-time behavior tracking
- Distributed model serving
- A/B testing for continuous improvement
Challenge 4: Cost Optimization
Problem: High infrastructure costs at scale
Solution:
- Open Connect CDN (reduces bandwidth costs)
- Efficient encoding (reduce storage)
- Caching (reduce compute)
- Auto-scaling (pay for what you use)
Monitoring and Observability
Key Metrics
Performance Metrics:
- Video start time
- Buffering rate
- Playback quality
- Error rate
Business Metrics:
- Subscriber count
- Watch time
- Content completion rate
- Churn rate
Infrastructure Metrics:
- CDN hit rate
- Server utilization
- Network bandwidth
- Storage usage
Alerting
- Alert on high error rates
- Alert on CDN issues
- Alert on recommendation service failures
- Alert on high latency
Best Practices
1. CDN Optimization
- Pre-populate popular content
- Cache at edge locations
- Monitor CDN performance
- Optimize cache hit rates
2. Encoding Strategy
- Use efficient codecs (AV1, VP9)
- Encode in multiple formats
- Optimize for quality vs size
- Regular re-encoding for new codecs
3. Recommendation System
- Continuously improve models
- A/B test new algorithms
- Monitor recommendation quality
- Balance exploration vs exploitation
4. Monitoring
- Monitor end-to-end user experience
- Track key business metrics
- Alert on critical issues
- Use distributed tracing
Quick Reference Summary
Netflix: Global video streaming platform serving 200M+ subscribers.
Key Components:
- Open Connect CDN (servers at ISPs)
- Microservices architecture
- Video encoding pipeline
- ML-based recommendation system
Key Design Decisions:
- Own CDN (Open Connect) for better control
- Microservices for scalability
- Adaptive streaming for varying bandwidth
- Multiple encoding formats for device support
Scaling Strategies:
- Horizontal scaling of microservices
- Global CDN distribution
- Database sharding
- Geographic distribution
Remember: Netflix's success comes from combining excellent content delivery (Open Connect CDN) with personalized recommendations (ML) and a scalable microservices architecture.
Previous Topic: Google Drive β
Next Topic: Instagram β
Back to: Step 12 Overview | Main Index