Amazon System Design
Quick Reference: Step 2: Databases | Step 4: Caching
Quick Reference
Scale: 300M+ customers, 12M+ products, billions of orders, $500B+ annual revenue
Key Components: Product catalog, inventory management, recommendations, search, payments, order processing, shipping
Challenges: Inventory management at scale, real-time recommendations, search across millions of products, order processing, payment security
Clear Definition
Amazon is the world's largest e-commerce platform, serving 300M+ customers with 12M+ products. It requires efficient product catalog management, real-time inventory tracking, personalized recommendations, powerful search, secure payment processing, and reliable order fulfillment at unprecedented scale.
π‘ Key Insight: Amazon uses polyglot persistence (SQL for transactions, NoSQL for catalog), sophisticated ML-based recommendations, distributed search indexes, and microservices architecture to handle massive scale and complexity.
System Requirements
Functional Requirements
-
Product Catalog
- Browse products
- Product details and images
- Product reviews and ratings
- Product categories
-
Search
- Search across millions of products
- Filter and sort results
- Autocomplete suggestions
- Search ranking
-
Shopping Cart
- Add/remove items
- Save for later
- Cart persistence
- Price updates
-
Order Processing
- Place orders
- Payment processing
- Order confirmation
- Order tracking
-
Inventory Management
- Real-time inventory tracking
- Stock availability
- Inventory updates
- Prevent overselling
-
Recommendations
- Personalized product recommendations
- "Customers who bought this also bought"
- Recently viewed items
- Trending products
Non-Functional Requirements
-
Scale
- 300M+ customers globally
- 12M+ products
- Billions of orders
- Handle traffic spikes (Black Friday)
-
Performance
- Fast page load (< 2 seconds)
- Fast search results
- Real-time inventory
- Low latency recommendations
-
Availability
- 99.99% uptime
- Handle traffic spikes
- Geographic distribution
- Disaster recovery
-
Consistency
- Inventory consistency (prevent overselling)
- Order consistency
- Payment consistency
- Price consistency
High-Level Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Applications β
β (Web, Mobile App) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
β HTTPS
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Load Balancer / API Gateway β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Catalog β β Search β β Cart β
β Service β β Service β β Service β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Inventory β β Recommendationβ β Order β
β Service β β Service β β Service β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Database Layer β
β (SQL: Orders, Inventory | NoSQL: Catalog, Search) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
1. Product Catalog Service
Responsibilities:
- Product information
- Product images
- Product metadata
- Product categories
Data Model:
- Product ID: Unique identifier
- Title, Description: Product details
- Price: Current price
- Images: Product images
- Categories: Product categories
- Attributes: Size, color, brand, etc.
Storage:
- NoSQL Database: DynamoDB or similar (flexible schema)
- Object Storage: Product images (S3)
- CDN: Serve images globally
Why NoSQL?
- Flexible schema (different products have different attributes)
- High read throughput
- Horizontal scaling
- Fast lookups
2. Search Service
Challenges:
- Search across 12M+ products
- Fast search results (< 100ms)
- Relevant ranking
- Handle typos and synonyms
Search Architecture:
User Query
β
βΌ
βββββββββββββββββββ
β Query Parser β
β - Tokenize β
β - Normalize β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Search Index β
β (Elasticsearch, β
β Solr, Custom) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Ranking β
β - Relevance β
β - Popularity β
β - Personalizationβ
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Results β
βββββββββββββββββββ
Search Features:
- Full-Text Search: Search in titles, descriptions
- Filters: Price, brand, rating, etc.
- Sorting: Relevance, price, rating, newest
- Autocomplete: Fast search suggestions
- Fuzzy Matching: Handle typos
Indexing:
- Inverted Index: Fast text search
- Faceted Search: Fast filtering
- Real-time Updates: Index new products quickly
- Distributed Index: Shard across multiple servers
3. Inventory Service
Critical Requirements:
- Consistency: Prevent overselling (critical)
- Real-time: Update inventory immediately
- Accuracy: Exact stock counts
- Performance: Fast inventory checks
Why SQL?
- ACID transactions (critical for inventory)
- Strong consistency
- Prevent race conditions
- Accurate stock counts
Inventory Architecture:
Order Request
β
βΌ
βββββββββββββββββββ
β Check Inventoryβ
β (SQL Database) β
ββββββββββ¬βββββββββ
β
ββββββ΄βββββ
β β
βΌ βΌ
Available Out of Stock
β β
β βββ> Return Error
β
βΌ
βββββββββββββββββββ
β Reserve Item β
β (Atomic Update)β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Confirm Order β
βββββββββββββββββββ
Inventory Management:
- Reserve on Add to Cart: Reserve when added (optional)
- Reserve on Checkout: Reserve during checkout
- Release on Cancel: Release if order cancelled
- Update on Shipment: Update when shipped
Optimizations:
- Caching: Cache inventory for reads (with TTL)
- Sharding: Shard by product ID
- Read Replicas: For read scalability
- Optimistic Locking: Prevent race conditions
4. Recommendation Service
How it Works:
- Data Collection: Track user behavior (views, purchases, ratings)
- Feature Engineering: Extract features from products and users
- Model Training: Train ML models (collaborative filtering, deep learning)
- Real-time Inference: Generate recommendations in real-time
- A/B Testing: Continuously test and improve
Recommendation Types:
- Personalized: Based on user history
- Item-to-Item: "Customers who bought this also bought"
- User-to-User: "Users like you also bought"
- Trending: Popular products
- Recently Viewed: Items user viewed
Recommendation Algorithms:
- Collaborative Filtering: Find similar users/products
- Content-Based: Recommend based on product features
- Deep Learning: Neural networks for complex patterns
- Hybrid: Combine multiple approaches
Performance:
- Pre-compute: Pre-compute recommendations (batch)
- Real-time Updates: Update based on recent activity
- Caching: Cache recommendations per user
- Fallback: Default recommendations if personalization fails
5. Shopping Cart Service
Responsibilities:
- Add/remove items
- Update quantities
- Save for later
- Cart persistence
Cart Storage:
- User Cart: Stored per user
- Session Cart: For anonymous users
- Persistence: Save cart across sessions
Cart Operations:
- Add Item: Add product to cart
- Update Quantity: Update item quantity
- Remove Item: Remove from cart
- Clear Cart: Empty cart
- Price Updates: Update prices if changed
Optimizations:
- Caching: Cache cart in Redis
- Async Price Updates: Update prices asynchronously
- Cart Expiry: Expire abandoned carts
6. Order Service
Order Flow:
1. User places order
β
2. Validate cart and inventory
β
3. Calculate total (prices, shipping, tax)
β
4. Process payment
β
5. Create order record
β
6. Reserve inventory
β
7. Send confirmation email
β
8. Fulfill order (warehouse)
Order Storage:
- SQL Database: ACID transactions (critical)
- Order History: Store all orders
- Order Status: Track order status
Order Processing:
- Synchronous: Critical operations (payment, inventory)
- Asynchronous: Non-critical (email, notifications)
- Idempotency: Handle duplicate orders
7. Payment Service
Requirements:
- Security: PCI DSS compliance
- Reliability: 99.99% uptime
- Fraud Detection: Detect fraudulent transactions
- Multiple Payment Methods: Credit cards, PayPal, etc.
Payment Flow:
1. User enters payment info
β
2. Payment service validates
β
3. Fraud detection check
β
4. Process payment (payment gateway)
β
5. Payment confirmation
β
6. Update order status
Security:
- Encryption: Encrypt payment data
- Tokenization: Don't store card numbers
- PCI Compliance: Follow PCI DSS standards
- Fraud Detection: ML-based fraud detection
Data Flow
Product Browse Flow
1. User browses products
β
2. Catalog service returns products
β
3. Product images served from CDN
β
4. Recommendations shown
β
5. User clicks product
β
6. Product details page loaded
Search Flow
1. User enters search query
β
2. Autocomplete suggestions shown
β
3. User submits search
β
4. Search service queries index
β
5. Results ranked and returned
β
6. Filters applied (if any)
β
7. Results displayed
Order Placement Flow
1. User adds items to cart
β
2. User proceeds to checkout
β
3. Cart validated
β
4. Inventory checked
β
5. Payment processed
β
6. Order created
β
7. Inventory reserved
β
8. Confirmation sent
Scaling Strategies
1. Polyglot Persistence
SQL Databases:
- Orders, payments, inventory (ACID required)
- Strong consistency
- Vertical and horizontal scaling
NoSQL Databases:
- Product catalog (flexible schema)
- High read throughput
- Horizontal scaling
Caching:
- Redis for hot data
- CDN for static content
- Application-level caching
2. Microservices Architecture
Service Independence:
- Each service scales independently
- Technology diversity
- Team autonomy
- Fault isolation
Service Communication:
- REST APIs
- Message queues (async)
- Event-driven architecture
3. Database Sharding
Product Catalog:
- Shard by product ID or category
- Distribute across databases
- Handle cross-shard queries
Orders:
- Shard by user ID or order ID
- Geographic sharding
- Archive old orders
4. Caching Strategy
Product Catalog:
- Cache popular products
- Cache product details
- Cache categories
Search:
- Cache popular searches
- Cache search results
- Cache autocomplete
Recommendations:
- Cache per-user recommendations
- Pre-compute recommendations
- Update cache periodically
5. CDN and Edge Computing
Static Content:
- Product images on CDN
- Serve from edge locations
- Reduce origin load
Dynamic Content:
- Edge computing for recommendations
- Geographic routing
- Reduce latency
Key Design Decisions
1. Polyglot Persistence
Decision: Use SQL for transactions, NoSQL for catalog
Rationale:
- Right tool for right job
- SQL: ACID for critical data
- NoSQL: Flexibility and scale for catalog
Trade-offs:
- β Optimized for each use case
- β Better performance
- β More complexity
- β Multiple systems to manage
2. Microservices Architecture
Decision: Use microservices instead of monolith
Rationale:
- Independent scaling
- Technology diversity
- Team autonomy
- Fault isolation
Trade-offs:
- β Better scalability
- β Technology flexibility
- β Higher complexity
- β Network overhead
3. Distributed Search
Decision: Use distributed search index (Elasticsearch, Solr)
Rationale:
- Fast search across millions of products
- Relevant ranking
- Handle high query volume
- Real-time indexing
Trade-offs:
- β Fast search
- β Relevant results
- β More infrastructure
- β Index management complexity
4. ML-Based Recommendations
Decision: Use ML for personalized recommendations
Rationale:
- Increase sales
- Better user experience
- Competitive advantage
- Data-driven decisions
Trade-offs:
- β Better recommendations
- β Increased sales
- β ML infrastructure needed
- β Continuous improvement required
Challenges and Solutions
Challenge 1: Inventory Consistency
Problem: Prevent overselling (selling more than available)
Solution:
- SQL database with ACID transactions
- Reserve inventory on checkout
- Optimistic locking
- Real-time inventory updates
Challenge 2: Search at Scale
Problem: Search across 12M+ products quickly
Solution:
- Distributed search index
- Inverted index for fast text search
- Caching popular searches
- Sharding search index
Challenge 3: Recommendations at Scale
Problem: Provide personalized recommendations to 300M+ users
Solution:
- ML-based recommendation system
- Pre-compute recommendations (batch)
- Real-time updates
- Efficient caching
Challenge 4: Traffic Spikes
Problem: Handle traffic spikes (Black Friday, Prime Day)
Solution:
- Auto-scaling
- CDN for static content
- Caching aggressively
- Load balancing
- Queue-based processing
Challenge 5: Payment Security
Problem: Secure payment processing
Solution:
- PCI DSS compliance
- Encryption and tokenization
- Fraud detection (ML)
- Secure payment gateways
Monitoring and Observability
Key Metrics
Performance Metrics:
- Page load time
- Search latency
- Order processing time
- Payment processing time
Business Metrics:
- Daily active users
- Orders per day
- Revenue
- Conversion rate
Infrastructure Metrics:
- Server utilization
- Database performance
- Cache hit rate
- CDN performance
Alerting
- Alert on high error rates
- Alert on inventory issues
- Alert on payment failures
- Alert on high latency
Best Practices
1. Inventory Management
- Use SQL for consistency
- Reserve on checkout
- Real-time updates
- Monitor inventory levels
2. Search Optimization
- Use distributed search index
- Cache popular searches
- Optimize ranking algorithms
- Handle typos and synonyms
3. Recommendations
- Continuously improve models
- A/B test new algorithms
- Monitor recommendation quality
- Balance exploration vs exploitation
4. Caching Strategy
- Cache hot data
- Use CDN for static content
- Cache recommendations
- Monitor cache hit rates
Quick Reference Summary
Amazon: World's largest e-commerce platform with 300M+ customers, 12M+ products.
Key Components:
- Product catalog (NoSQL)
- Inventory management (SQL)
- Search service (distributed index)
- Recommendation service (ML-based)
- Order processing
- Payment service
Key Design Decisions:
- Polyglot persistence (SQL + NoSQL)
- Microservices architecture
- Distributed search
- ML-based recommendations
Scaling Strategies:
- Horizontal scaling of microservices
- Database sharding
- Aggressive caching
- CDN for static content
Remember: Amazon's success comes from combining polyglot persistence (right tool for right job), sophisticated recommendations (ML), and scalable microservices architecture to handle massive scale and complexity.
Previous Topic: Uber β
Next Topic: Google Drive β
Back to: Step 12 Overview | Main Index