Amazon System Design

Quick Reference: Step 2: Databases | Step 4: Caching


Quick Reference

Scale: 300M+ customers, 12M+ products, billions of orders, $500B+ annual revenue

Key Components: Product catalog, inventory management, recommendations, search, payments, order processing, shipping

Challenges: Inventory management at scale, real-time recommendations, search across millions of products, order processing, payment security


Clear Definition

Amazon is the world's largest e-commerce platform, serving 300M+ customers with 12M+ products. It requires efficient product catalog management, real-time inventory tracking, personalized recommendations, powerful search, secure payment processing, and reliable order fulfillment at unprecedented scale.

πŸ’‘ Key Insight: Amazon uses polyglot persistence (SQL for transactions, NoSQL for catalog), sophisticated ML-based recommendations, distributed search indexes, and microservices architecture to handle massive scale and complexity.


System Requirements

Functional Requirements

  1. Product Catalog

    • Browse products
    • Product details and images
    • Product reviews and ratings
    • Product categories
  2. Search

    • Search across millions of products
    • Filter and sort results
    • Autocomplete suggestions
    • Search ranking
  3. Shopping Cart

    • Add/remove items
    • Save for later
    • Cart persistence
    • Price updates
  4. Order Processing

    • Place orders
    • Payment processing
    • Order confirmation
    • Order tracking
  5. Inventory Management

    • Real-time inventory tracking
    • Stock availability
    • Inventory updates
    • Prevent overselling
  6. Recommendations

    • Personalized product recommendations
    • "Customers who bought this also bought"
    • Recently viewed items
    • Trending products

Non-Functional Requirements

  1. Scale

    • 300M+ customers globally
    • 12M+ products
    • Billions of orders
    • Handle traffic spikes (Black Friday)
  2. Performance

    • Fast page load (< 2 seconds)
    • Fast search results
    • Real-time inventory
    • Low latency recommendations
  3. Availability

    • 99.99% uptime
    • Handle traffic spikes
    • Geographic distribution
    • Disaster recovery
  4. Consistency

    • Inventory consistency (prevent overselling)
    • Order consistency
    • Payment consistency
    • Price consistency

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Client Applications                       β”‚
β”‚  (Web, Mobile App)                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β”‚ HTTPS
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Load Balancer / API Gateway               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚               β”‚               β”‚
        β–Ό               β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Catalog    β”‚ β”‚   Search     β”‚ β”‚   Cart       β”‚
β”‚   Service    β”‚ β”‚   Service    β”‚ β”‚   Service    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                β”‚                 β”‚
       β”‚                β”‚                 β”‚
       β–Ό                β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Inventory   β”‚ β”‚  Recommendationβ”‚ β”‚  Order      β”‚
β”‚  Service     β”‚ β”‚  Service     β”‚ β”‚  Service     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                β”‚                 β”‚
       β”‚                β”‚                 β”‚
       β–Ό                β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Database Layer                           β”‚
β”‚  (SQL: Orders, Inventory | NoSQL: Catalog, Search)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

1. Product Catalog Service

Responsibilities:

  • Product information
  • Product images
  • Product metadata
  • Product categories

Data Model:

  • Product ID: Unique identifier
  • Title, Description: Product details
  • Price: Current price
  • Images: Product images
  • Categories: Product categories
  • Attributes: Size, color, brand, etc.

Storage:

  • NoSQL Database: DynamoDB or similar (flexible schema)
  • Object Storage: Product images (S3)
  • CDN: Serve images globally

Why NoSQL?

  • Flexible schema (different products have different attributes)
  • High read throughput
  • Horizontal scaling
  • Fast lookups

2. Search Service

Challenges:

  • Search across 12M+ products
  • Fast search results (< 100ms)
  • Relevant ranking
  • Handle typos and synonyms

Search Architecture:

User Query
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Query Parser   β”‚
β”‚  - Tokenize     β”‚
β”‚  - Normalize    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Search Index   β”‚
β”‚  (Elasticsearch, β”‚
β”‚   Solr, Custom)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Ranking        β”‚
β”‚  - Relevance    β”‚
β”‚  - Popularity   β”‚
β”‚  - Personalizationβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Results        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Search Features:

  • Full-Text Search: Search in titles, descriptions
  • Filters: Price, brand, rating, etc.
  • Sorting: Relevance, price, rating, newest
  • Autocomplete: Fast search suggestions
  • Fuzzy Matching: Handle typos

Indexing:

  • Inverted Index: Fast text search
  • Faceted Search: Fast filtering
  • Real-time Updates: Index new products quickly
  • Distributed Index: Shard across multiple servers

3. Inventory Service

Critical Requirements:

  • Consistency: Prevent overselling (critical)
  • Real-time: Update inventory immediately
  • Accuracy: Exact stock counts
  • Performance: Fast inventory checks

Why SQL?

  • ACID transactions (critical for inventory)
  • Strong consistency
  • Prevent race conditions
  • Accurate stock counts

Inventory Architecture:

Order Request
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Check Inventoryβ”‚
β”‚  (SQL Database) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    β”‚         β”‚
    β–Ό         β–Ό
Available  Out of Stock
    β”‚         β”‚
    β”‚         └──> Return Error
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Reserve Item   β”‚
β”‚  (Atomic Update)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Confirm Order  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Inventory Management:

  • Reserve on Add to Cart: Reserve when added (optional)
  • Reserve on Checkout: Reserve during checkout
  • Release on Cancel: Release if order cancelled
  • Update on Shipment: Update when shipped

Optimizations:

  • Caching: Cache inventory for reads (with TTL)
  • Sharding: Shard by product ID
  • Read Replicas: For read scalability
  • Optimistic Locking: Prevent race conditions

4. Recommendation Service

How it Works:

  1. Data Collection: Track user behavior (views, purchases, ratings)
  2. Feature Engineering: Extract features from products and users
  3. Model Training: Train ML models (collaborative filtering, deep learning)
  4. Real-time Inference: Generate recommendations in real-time
  5. A/B Testing: Continuously test and improve

Recommendation Types:

  • Personalized: Based on user history
  • Item-to-Item: "Customers who bought this also bought"
  • User-to-User: "Users like you also bought"
  • Trending: Popular products
  • Recently Viewed: Items user viewed

Recommendation Algorithms:

  • Collaborative Filtering: Find similar users/products
  • Content-Based: Recommend based on product features
  • Deep Learning: Neural networks for complex patterns
  • Hybrid: Combine multiple approaches

Performance:

  • Pre-compute: Pre-compute recommendations (batch)
  • Real-time Updates: Update based on recent activity
  • Caching: Cache recommendations per user
  • Fallback: Default recommendations if personalization fails

5. Shopping Cart Service

Responsibilities:

  • Add/remove items
  • Update quantities
  • Save for later
  • Cart persistence

Cart Storage:

  • User Cart: Stored per user
  • Session Cart: For anonymous users
  • Persistence: Save cart across sessions

Cart Operations:

  • Add Item: Add product to cart
  • Update Quantity: Update item quantity
  • Remove Item: Remove from cart
  • Clear Cart: Empty cart
  • Price Updates: Update prices if changed

Optimizations:

  • Caching: Cache cart in Redis
  • Async Price Updates: Update prices asynchronously
  • Cart Expiry: Expire abandoned carts

6. Order Service

Order Flow:

1. User places order
   β”‚
2. Validate cart and inventory
   β”‚
3. Calculate total (prices, shipping, tax)
   β”‚
4. Process payment
   β”‚
5. Create order record
   β”‚
6. Reserve inventory
   β”‚
7. Send confirmation email
   β”‚
8. Fulfill order (warehouse)

Order Storage:

  • SQL Database: ACID transactions (critical)
  • Order History: Store all orders
  • Order Status: Track order status

Order Processing:

  • Synchronous: Critical operations (payment, inventory)
  • Asynchronous: Non-critical (email, notifications)
  • Idempotency: Handle duplicate orders

7. Payment Service

Requirements:

  • Security: PCI DSS compliance
  • Reliability: 99.99% uptime
  • Fraud Detection: Detect fraudulent transactions
  • Multiple Payment Methods: Credit cards, PayPal, etc.

Payment Flow:

1. User enters payment info
   β”‚
2. Payment service validates
   β”‚
3. Fraud detection check
   β”‚
4. Process payment (payment gateway)
   β”‚
5. Payment confirmation
   β”‚
6. Update order status

Security:

  • Encryption: Encrypt payment data
  • Tokenization: Don't store card numbers
  • PCI Compliance: Follow PCI DSS standards
  • Fraud Detection: ML-based fraud detection

Data Flow

Product Browse Flow

1. User browses products
   β”‚
2. Catalog service returns products
   β”‚
3. Product images served from CDN
   β”‚
4. Recommendations shown
   β”‚
5. User clicks product
   β”‚
6. Product details page loaded

Search Flow

1. User enters search query
   β”‚
2. Autocomplete suggestions shown
   β”‚
3. User submits search
   β”‚
4. Search service queries index
   β”‚
5. Results ranked and returned
   β”‚
6. Filters applied (if any)
   β”‚
7. Results displayed

Order Placement Flow

1. User adds items to cart
   β”‚
2. User proceeds to checkout
   β”‚
3. Cart validated
   β”‚
4. Inventory checked
   β”‚
5. Payment processed
   β”‚
6. Order created
   β”‚
7. Inventory reserved
   β”‚
8. Confirmation sent

Scaling Strategies

1. Polyglot Persistence

SQL Databases:

  • Orders, payments, inventory (ACID required)
  • Strong consistency
  • Vertical and horizontal scaling

NoSQL Databases:

  • Product catalog (flexible schema)
  • High read throughput
  • Horizontal scaling

Caching:

  • Redis for hot data
  • CDN for static content
  • Application-level caching

2. Microservices Architecture

Service Independence:

  • Each service scales independently
  • Technology diversity
  • Team autonomy
  • Fault isolation

Service Communication:

  • REST APIs
  • Message queues (async)
  • Event-driven architecture

3. Database Sharding

Product Catalog:

  • Shard by product ID or category
  • Distribute across databases
  • Handle cross-shard queries

Orders:

  • Shard by user ID or order ID
  • Geographic sharding
  • Archive old orders

4. Caching Strategy

Product Catalog:

  • Cache popular products
  • Cache product details
  • Cache categories

Search:

  • Cache popular searches
  • Cache search results
  • Cache autocomplete

Recommendations:

  • Cache per-user recommendations
  • Pre-compute recommendations
  • Update cache periodically

5. CDN and Edge Computing

Static Content:

  • Product images on CDN
  • Serve from edge locations
  • Reduce origin load

Dynamic Content:

  • Edge computing for recommendations
  • Geographic routing
  • Reduce latency

Key Design Decisions

1. Polyglot Persistence

Decision: Use SQL for transactions, NoSQL for catalog

Rationale:

  • Right tool for right job
  • SQL: ACID for critical data
  • NoSQL: Flexibility and scale for catalog

Trade-offs:

  • βœ… Optimized for each use case
  • βœ… Better performance
  • ❌ More complexity
  • ❌ Multiple systems to manage

2. Microservices Architecture

Decision: Use microservices instead of monolith

Rationale:

  • Independent scaling
  • Technology diversity
  • Team autonomy
  • Fault isolation

Trade-offs:

  • βœ… Better scalability
  • βœ… Technology flexibility
  • ❌ Higher complexity
  • ❌ Network overhead

Decision: Use distributed search index (Elasticsearch, Solr)

Rationale:

  • Fast search across millions of products
  • Relevant ranking
  • Handle high query volume
  • Real-time indexing

Trade-offs:

  • βœ… Fast search
  • βœ… Relevant results
  • ❌ More infrastructure
  • ❌ Index management complexity

4. ML-Based Recommendations

Decision: Use ML for personalized recommendations

Rationale:

  • Increase sales
  • Better user experience
  • Competitive advantage
  • Data-driven decisions

Trade-offs:

  • βœ… Better recommendations
  • βœ… Increased sales
  • ❌ ML infrastructure needed
  • ❌ Continuous improvement required

Challenges and Solutions

Challenge 1: Inventory Consistency

Problem: Prevent overselling (selling more than available)

Solution:

  • SQL database with ACID transactions
  • Reserve inventory on checkout
  • Optimistic locking
  • Real-time inventory updates

Challenge 2: Search at Scale

Problem: Search across 12M+ products quickly

Solution:

  • Distributed search index
  • Inverted index for fast text search
  • Caching popular searches
  • Sharding search index

Challenge 3: Recommendations at Scale

Problem: Provide personalized recommendations to 300M+ users

Solution:

  • ML-based recommendation system
  • Pre-compute recommendations (batch)
  • Real-time updates
  • Efficient caching

Challenge 4: Traffic Spikes

Problem: Handle traffic spikes (Black Friday, Prime Day)

Solution:

  • Auto-scaling
  • CDN for static content
  • Caching aggressively
  • Load balancing
  • Queue-based processing

Challenge 5: Payment Security

Problem: Secure payment processing

Solution:

  • PCI DSS compliance
  • Encryption and tokenization
  • Fraud detection (ML)
  • Secure payment gateways

Monitoring and Observability

Key Metrics

Performance Metrics:

  • Page load time
  • Search latency
  • Order processing time
  • Payment processing time

Business Metrics:

  • Daily active users
  • Orders per day
  • Revenue
  • Conversion rate

Infrastructure Metrics:

  • Server utilization
  • Database performance
  • Cache hit rate
  • CDN performance

Alerting

  • Alert on high error rates
  • Alert on inventory issues
  • Alert on payment failures
  • Alert on high latency

Best Practices

1. Inventory Management

  • Use SQL for consistency
  • Reserve on checkout
  • Real-time updates
  • Monitor inventory levels

2. Search Optimization

  • Use distributed search index
  • Cache popular searches
  • Optimize ranking algorithms
  • Handle typos and synonyms

3. Recommendations

  • Continuously improve models
  • A/B test new algorithms
  • Monitor recommendation quality
  • Balance exploration vs exploitation

4. Caching Strategy

  • Cache hot data
  • Use CDN for static content
  • Cache recommendations
  • Monitor cache hit rates

Quick Reference Summary

Amazon: World's largest e-commerce platform with 300M+ customers, 12M+ products.

Key Components:

  • Product catalog (NoSQL)
  • Inventory management (SQL)
  • Search service (distributed index)
  • Recommendation service (ML-based)
  • Order processing
  • Payment service

Key Design Decisions:

  • Polyglot persistence (SQL + NoSQL)
  • Microservices architecture
  • Distributed search
  • ML-based recommendations

Scaling Strategies:

  • Horizontal scaling of microservices
  • Database sharding
  • Aggressive caching
  • CDN for static content

Remember: Amazon's success comes from combining polyglot persistence (right tool for right job), sophisticated recommendations (ML), and scalable microservices architecture to handle massive scale and complexity.


Previous Topic: Uber ←

Next Topic: Google Drive β†’

Back to: Step 12 Overview | Main Index