Threads

📋 Quick Reference

Aspect	Details
Definition	Lightweight process that shares memory space with other threads
vs Process	Threads share memory; processes have separate memory
Creation Cost	Lower than processes (faster to create, less memory)
Communication	Shared memory (faster than inter-process communication)
Concurrency Model	Preemptive multitasking (OS schedules threads)
Use Cases	I/O-bound tasks, parallel processing, responsive UIs
Challenges	Race conditions, deadlocks, synchronization

TL;DR: Threads are lightweight execution units within a process. They share memory, enabling efficient parallel processing but requiring careful synchronization to avoid race conditions.

Clear Definition

A thread is the smallest unit of execution that can be managed independently by an operating system scheduler. Threads within the same process share the process's memory space, code, and data, but each thread has its own stack and registers.

Think of a process as a house (with its own address space), and threads as people living in that house (sharing the same address space but having their own personal space/stack).

💡 Key Insight: Threads enable concurrent execution within a single process. They're lighter than processes because they share memory, making context switching faster. However, this shared memory requires synchronization mechanisms (locks, mutexes) to prevent race conditions.

Core Concepts

Thread Basics

Process vs Thread:

Process: Independent program with its own memory space
- Isolated memory (can't directly access another process's memory)
- Heavier (more memory, slower to create)
- Inter-process communication (IPC) required for sharing data
Thread: Execution unit within a process
- Shares memory with other threads in same process
- Lighter (less memory, faster to create)
- Direct memory access (no IPC needed, but requires synchronization)

Thread States:

New: Thread created but not started
Runnable: Thread ready to run, waiting for CPU
Running: Thread executing on CPU
Blocked: Thread waiting for I/O or lock
Terminated: Thread finished execution

Thread Lifecycle:

New → Runnable → Running → Blocked → Runnable → Running → Terminated

Concurrency Models

1. Preemptive Multithreading (Most Common)

OS scheduler decides when threads run
Threads can be interrupted at any time
Used in: Java, C#, Python (with GIL limitations), C++

2. Cooperative Multithreading

Threads voluntarily yield control
One misbehaving thread can block others
Used in: Some languages (Go goroutines are hybrid)

3. Green Threads / User-Level Threads

Managed by runtime/library, not OS
Many-to-one mapping (multiple green threads → one OS thread)
Example: Early Java versions

Thread Synchronization

Why Needed:

Multiple threads accessing shared data simultaneously
Race conditions: Unpredictable behavior due to timing
Example: Two threads incrementing same counter

Synchronization Mechanisms:

Mutex (Mutual Exclusion)
- Lock that only one thread can hold
- Other threads wait until lock released
- Example: pthread_mutex_t in C, synchronized in Java
Semaphore
- Counter that controls access to resource
- Allows N threads to access simultaneously
- Example: Database connection pool (max 10 connections)
Condition Variables
- Threads wait for condition to become true
- Used with mutexes
- Example: Producer-consumer pattern
Atomic Operations
- Operations that complete entirely or not at all
- Hardware-level guarantees
- Example: Atomic increment, compare-and-swap
Read-Write Locks
- Multiple readers OR one writer
- Optimizes for read-heavy workloads
- Example: Shared configuration that's read often, written rarely

Use Cases

When to Use Threads

I/O-Bound Applications: Waiting for network, disk I/O
- Example: Web server handling multiple requests
- Thread waits for database response, other threads handle other requests
- Better CPU utilization
Parallel Processing: CPU-intensive tasks with multiple cores
- Example: Image processing, video encoding
- Divide work across threads, utilize all CPU cores
- Example: Process 4 images simultaneously on 4-core CPU
Responsive User Interfaces: Keep UI responsive during long operations
- Example: Desktop applications
- Background thread does heavy computation, UI thread stays responsive
- Example: File download progress bar
Concurrent Servers: Handle multiple clients simultaneously
- Example: Chat server, game server
- One thread per client connection
- Example: 1000 concurrent users = 1000 threads
Producer-Consumer Patterns: Separate production and consumption
- Example: Log processing system
- Producer threads generate logs, consumer threads process them
- Queue between them

When NOT to Use Threads

CPU-Bound Single-Core Systems: No benefit, overhead only
- Example: Embedded systems with single core
- Context switching overhead without parallelism
Simple Sequential Programs: Unnecessary complexity
- Example: Command-line utilities
- If no parallelism needed, don't add threads
Languages with GIL (Global Interpreter Lock): Limited parallelism
- Example: Python (CPython)
- GIL prevents true parallelism (only one thread executes Python code at a time)
- Use multiprocessing instead for CPU-bound tasks
Highly Contended Resources: Lock contention becomes bottleneck
- Example: All threads accessing same database row
- Consider alternative architectures (queues, async)

Advantages & Disadvantages

Thread Advantages

✅ Efficient Resource Sharing: Share memory directly

No IPC overhead
Faster than inter-process communication
Example: Shared cache in memory

✅ Responsive Applications: Keep UI responsive

Background threads for heavy work
User doesn't see freezing

✅ Better Resource Utilization: Use multiple CPU cores

Parallel execution on multicore systems
Example: 8-core CPU can run 8 threads simultaneously

✅ Lower Overhead: Lighter than processes

Faster creation (microseconds vs milliseconds)
Less memory per thread (~1-2MB vs ~10-100MB for process)
Faster context switching

✅ Simpler Communication: Shared memory

No need for pipes, sockets, shared memory segments
Direct variable access (with synchronization)

Thread Disadvantages

❌ Synchronization Complexity: Race conditions, deadlocks

Difficult to debug
Example: Deadlock when two threads wait for each other's locks

❌ Shared State Issues: Bugs are hard to reproduce

Timing-dependent bugs
Example: Race condition appears once in 1000 runs

❌ No Fault Isolation: One thread crash can affect others

Process crash affects all threads
Example: Segmentation fault in one thread crashes entire process

❌ Scalability Limits: Too many threads cause overhead

Context switching overhead
Rule of thumb: ~100-1000 threads per process (OS dependent)
Example: 10,000 threads = mostly context switching, little work

❌ Platform Dependent: Different threading models

POSIX threads (pthreads) vs Windows threads
Portability concerns

Best Practices

Minimize Shared State
- Prefer immutable data structures
- Use thread-local storage when possible
- Example: Each thread has its own counter instead of shared counter
Use Thread Pools
- Don't create threads on-demand
- Reuse threads (creation is expensive)
- Example: Java ExecutorService, Python ThreadPoolExecutor
- Typical pool size: 2x CPU cores for CPU-bound, 10-100 for I/O-bound
Lock Granularity
- Fine-grained locks (lock only what's necessary)
- Coarse-grained locks reduce parallelism
- Example: Lock specific data structure, not entire function
Avoid Deadlocks
- Always acquire locks in same order
- Use timeout on lock acquisition
- Example: Lock A then B (never B then A)
Prefer Higher-Level Abstractions
- Use concurrent data structures (java.util.concurrent)
- Use async/await patterns (modern languages)
- Example: ConcurrentHashMap instead of HashMap with locks
Monitor and Profile
- Track thread count, CPU usage per thread
- Identify lock contention
- Use profiling tools (JProfiler, VisualVM)
Consider Alternatives
- Async/await for I/O-bound (Node.js, Python asyncio)
- Processes for CPU-bound (Python multiprocessing)
- Message passing (Erlang, Go channels)

Common Pitfalls

⚠️ Common Mistake: Creating too many threads

Problem: Context switching overhead, memory exhaustion
Solution: Use thread pools, limit thread count (typically 2x CPU cores for CPU-bound)

⚠️ Common Mistake: Race conditions on shared variables

Problem: Unpredictable behavior, hard to debug
Solution: Use synchronization (locks, atomic operations), minimize shared state

⚠️ Common Mistake: Deadlocks

Problem: Threads waiting forever
Solution: Consistent lock ordering, timeout on locks, avoid nested locks

⚠️ Common Mistake: False sharing

Problem: Threads on different CPUs accessing same cache line
Solution: Pad data structures, use thread-local storage

⚠️ Common Mistake: Ignoring thread safety in libraries

Problem: Using non-thread-safe code in multithreaded environment
Solution: Check documentation, use thread-safe alternatives

⚠️ Common Mistake: Blocking operations in UI thread

Problem: UI freezes
Solution: Move blocking operations to background threads

Interview Tips

🎯 Interview Focus: Understanding concurrency, synchronization, and trade-offs

Common Questions:

"What's the difference between a process and a thread?"
- Answer: Process has isolated memory, thread shares memory. Threads are lighter, faster to create, but require synchronization.
"How would you handle 10,000 concurrent requests?"
- Answer: Thread pool (not 10,000 threads!), async I/O, or event-driven architecture. Thread pool size: ~100-200 threads for I/O-bound.
"What is a race condition? How do you prevent it?"
- Answer: Unpredictable behavior when threads access shared data without synchronization. Prevent with locks, atomic operations, immutable data.
"Explain deadlock. How do you avoid it?"
- Answer: Two threads waiting for each other's locks. Avoid by consistent lock ordering, timeouts, avoiding nested locks.
"When would you use threads vs processes vs async?"
- Answer: Threads for I/O-bound with shared state. Processes for CPU-bound or isolation. Async for high-concurrency I/O without threads.

Red Flags to Avoid:

Creating one thread per request (use thread pools)
Ignoring synchronization needs
Not understanding GIL limitations (Python)
Overlooking deadlock possibilities

Load Balancing (Step 6): Distributes load across processes/servers (threads are within process)
Microservices (Step 8): Process-level distribution (threads are within process)
Message Queues (Step 7): Alternative to threads for concurrency
Caching (Step 4): Thread-safe caching is important

Visual Aids

Process vs Thread Memory Model

Process A:              Process B:
┌─────────────┐        ┌─────────────┐
│   Memory    │        │   Memory    │
│   Space     │        │   Space     │
│             │        │             │
│ ┌─────────┐ │        │ ┌─────────┐ │
│ │ Thread1 │ │        │ │ Thread1 │ │
│ └─────────┘ │        │ └─────────┘ │
│ ┌─────────┐ │        │ ┌─────────┐ │
│ │ Thread2 │ │        │ │ Thread2 │ │
│ └─────────┘ │        │ └─────────┘ │
└─────────────┘        └─────────────┘
   (Isolated)              (Isolated)

Within Process A:
┌─────────────────────────────┐
│      Shared Memory          │
│  (Code, Data, Heap)         │
│                             │
│  ┌─────────┐  ┌─────────┐  │
│  │ Thread1 │  │ Thread2 │  │
│  │ Stack   │  │ Stack   │  │
│  └─────────┘  └─────────┘  │
└─────────────────────────────┘

Thread Pool Pattern

Request Queue
     │
     ▼
┌─────────────────┐
│  Thread Pool    │
│  (Fixed Size)   │
│                 │
│ ┌───┐ ┌───┐    │
│ │ T │ │ T │    │
│ │ 1 │ │ 2 │    │
│ └───┘ └───┘    │
│ ┌───┐ ┌───┐    │
│ │ T │ │ T │    │
│ │ 3 │ │ 4 │    │
│ └───┘ └───┘    │
└─────────────────┘

Back to: Step 1 Index | Main Index

# Threads

# 📋 Quick Reference

# Clear Definition

# Core Concepts

# Thread Basics

# Concurrency Models

# Thread Synchronization

# Use Cases

# When to Use Threads

# When NOT to Use Threads

# Advantages & Disadvantages

# Thread Advantages

# Thread Disadvantages

# Best Practices

# Common Pitfalls

# Interview Tips

# Related Topics

# Visual Aids

# Process vs Thread Memory Model

# Thread Pool Pattern