Threads
š Quick Reference
| Aspect | Details |
|---|---|
| Definition | Lightweight process that shares memory space with other threads |
| vs Process | Threads share memory; processes have separate memory |
| Creation Cost | Lower than processes (faster to create, less memory) |
| Communication | Shared memory (faster than inter-process communication) |
| Concurrency Model | Preemptive multitasking (OS schedules threads) |
| Use Cases | I/O-bound tasks, parallel processing, responsive UIs |
| Challenges | Race conditions, deadlocks, synchronization |
TL;DR: Threads are lightweight execution units within a process. They share memory, enabling efficient parallel processing but requiring careful synchronization to avoid race conditions.
Clear Definition
A thread is the smallest unit of execution that can be managed independently by an operating system scheduler. Threads within the same process share the process's memory space, code, and data, but each thread has its own stack and registers.
Think of a process as a house (with its own address space), and threads as people living in that house (sharing the same address space but having their own personal space/stack).
š” Key Insight: Threads enable concurrent execution within a single process. They're lighter than processes because they share memory, making context switching faster. However, this shared memory requires synchronization mechanisms (locks, mutexes) to prevent race conditions.
Core Concepts
Thread Basics
Process vs Thread:
-
Process: Independent program with its own memory space
- Isolated memory (can't directly access another process's memory)
- Heavier (more memory, slower to create)
- Inter-process communication (IPC) required for sharing data
-
Thread: Execution unit within a process
- Shares memory with other threads in same process
- Lighter (less memory, faster to create)
- Direct memory access (no IPC needed, but requires synchronization)
Thread States:
- New: Thread created but not started
- Runnable: Thread ready to run, waiting for CPU
- Running: Thread executing on CPU
- Blocked: Thread waiting for I/O or lock
- Terminated: Thread finished execution
Thread Lifecycle:
New ā Runnable ā Running ā Blocked ā Runnable ā Running ā Terminated
Concurrency Models
1. Preemptive Multithreading (Most Common)
- OS scheduler decides when threads run
- Threads can be interrupted at any time
- Used in: Java, C#, Python (with GIL limitations), C++
2. Cooperative Multithreading
- Threads voluntarily yield control
- One misbehaving thread can block others
- Used in: Some languages (Go goroutines are hybrid)
3. Green Threads / User-Level Threads
- Managed by runtime/library, not OS
- Many-to-one mapping (multiple green threads ā one OS thread)
- Example: Early Java versions
Thread Synchronization
Why Needed:
- Multiple threads accessing shared data simultaneously
- Race conditions: Unpredictable behavior due to timing
- Example: Two threads incrementing same counter
Synchronization Mechanisms:
-
Mutex (Mutual Exclusion)
- Lock that only one thread can hold
- Other threads wait until lock released
- Example:
pthread_mutex_tin C,synchronizedin Java
-
Semaphore
- Counter that controls access to resource
- Allows N threads to access simultaneously
- Example: Database connection pool (max 10 connections)
-
Condition Variables
- Threads wait for condition to become true
- Used with mutexes
- Example: Producer-consumer pattern
-
Atomic Operations
- Operations that complete entirely or not at all
- Hardware-level guarantees
- Example: Atomic increment, compare-and-swap
-
Read-Write Locks
- Multiple readers OR one writer
- Optimizes for read-heavy workloads
- Example: Shared configuration that's read often, written rarely
Use Cases
When to Use Threads
-
I/O-Bound Applications: Waiting for network, disk I/O
- Example: Web server handling multiple requests
- Thread waits for database response, other threads handle other requests
- Better CPU utilization
-
Parallel Processing: CPU-intensive tasks with multiple cores
- Example: Image processing, video encoding
- Divide work across threads, utilize all CPU cores
- Example: Process 4 images simultaneously on 4-core CPU
-
Responsive User Interfaces: Keep UI responsive during long operations
- Example: Desktop applications
- Background thread does heavy computation, UI thread stays responsive
- Example: File download progress bar
-
Concurrent Servers: Handle multiple clients simultaneously
- Example: Chat server, game server
- One thread per client connection
- Example: 1000 concurrent users = 1000 threads
-
Producer-Consumer Patterns: Separate production and consumption
- Example: Log processing system
- Producer threads generate logs, consumer threads process them
- Queue between them
When NOT to Use Threads
-
CPU-Bound Single-Core Systems: No benefit, overhead only
- Example: Embedded systems with single core
- Context switching overhead without parallelism
-
Simple Sequential Programs: Unnecessary complexity
- Example: Command-line utilities
- If no parallelism needed, don't add threads
-
Languages with GIL (Global Interpreter Lock): Limited parallelism
- Example: Python (CPython)
- GIL prevents true parallelism (only one thread executes Python code at a time)
- Use multiprocessing instead for CPU-bound tasks
-
Highly Contended Resources: Lock contention becomes bottleneck
- Example: All threads accessing same database row
- Consider alternative architectures (queues, async)
Advantages & Disadvantages
Thread Advantages
ā Efficient Resource Sharing: Share memory directly
- No IPC overhead
- Faster than inter-process communication
- Example: Shared cache in memory
ā Responsive Applications: Keep UI responsive
- Background threads for heavy work
- User doesn't see freezing
ā Better Resource Utilization: Use multiple CPU cores
- Parallel execution on multicore systems
- Example: 8-core CPU can run 8 threads simultaneously
ā Lower Overhead: Lighter than processes
- Faster creation (microseconds vs milliseconds)
- Less memory per thread (~1-2MB vs ~10-100MB for process)
- Faster context switching
ā Simpler Communication: Shared memory
- No need for pipes, sockets, shared memory segments
- Direct variable access (with synchronization)
Thread Disadvantages
ā Synchronization Complexity: Race conditions, deadlocks
- Difficult to debug
- Example: Deadlock when two threads wait for each other's locks
ā Shared State Issues: Bugs are hard to reproduce
- Timing-dependent bugs
- Example: Race condition appears once in 1000 runs
ā No Fault Isolation: One thread crash can affect others
- Process crash affects all threads
- Example: Segmentation fault in one thread crashes entire process
ā Scalability Limits: Too many threads cause overhead
- Context switching overhead
- Rule of thumb: ~100-1000 threads per process (OS dependent)
- Example: 10,000 threads = mostly context switching, little work
ā Platform Dependent: Different threading models
- POSIX threads (pthreads) vs Windows threads
- Portability concerns
Best Practices
-
Minimize Shared State
- Prefer immutable data structures
- Use thread-local storage when possible
- Example: Each thread has its own counter instead of shared counter
-
Use Thread Pools
- Don't create threads on-demand
- Reuse threads (creation is expensive)
- Example: Java ExecutorService, Python ThreadPoolExecutor
- Typical pool size: 2x CPU cores for CPU-bound, 10-100 for I/O-bound
-
Lock Granularity
- Fine-grained locks (lock only what's necessary)
- Coarse-grained locks reduce parallelism
- Example: Lock specific data structure, not entire function
-
Avoid Deadlocks
- Always acquire locks in same order
- Use timeout on lock acquisition
- Example: Lock A then B (never B then A)
-
Prefer Higher-Level Abstractions
- Use concurrent data structures (java.util.concurrent)
- Use async/await patterns (modern languages)
- Example:
ConcurrentHashMapinstead ofHashMapwith locks
-
Monitor and Profile
- Track thread count, CPU usage per thread
- Identify lock contention
- Use profiling tools (JProfiler, VisualVM)
-
Consider Alternatives
- Async/await for I/O-bound (Node.js, Python asyncio)
- Processes for CPU-bound (Python multiprocessing)
- Message passing (Erlang, Go channels)
Common Pitfalls
ā ļø Common Mistake: Creating too many threads
- Problem: Context switching overhead, memory exhaustion
- Solution: Use thread pools, limit thread count (typically 2x CPU cores for CPU-bound)
ā ļø Common Mistake: Race conditions on shared variables
- Problem: Unpredictable behavior, hard to debug
- Solution: Use synchronization (locks, atomic operations), minimize shared state
ā ļø Common Mistake: Deadlocks
- Problem: Threads waiting forever
- Solution: Consistent lock ordering, timeout on locks, avoid nested locks
ā ļø Common Mistake: False sharing
- Problem: Threads on different CPUs accessing same cache line
- Solution: Pad data structures, use thread-local storage
ā ļø Common Mistake: Ignoring thread safety in libraries
- Problem: Using non-thread-safe code in multithreaded environment
- Solution: Check documentation, use thread-safe alternatives
ā ļø Common Mistake: Blocking operations in UI thread
- Problem: UI freezes
- Solution: Move blocking operations to background threads
Interview Tips
šÆ Interview Focus: Understanding concurrency, synchronization, and trade-offs
Common Questions:
-
"What's the difference between a process and a thread?"
- Answer: Process has isolated memory, thread shares memory. Threads are lighter, faster to create, but require synchronization.
-
"How would you handle 10,000 concurrent requests?"
- Answer: Thread pool (not 10,000 threads!), async I/O, or event-driven architecture. Thread pool size: ~100-200 threads for I/O-bound.
-
"What is a race condition? How do you prevent it?"
- Answer: Unpredictable behavior when threads access shared data without synchronization. Prevent with locks, atomic operations, immutable data.
-
"Explain deadlock. How do you avoid it?"
- Answer: Two threads waiting for each other's locks. Avoid by consistent lock ordering, timeouts, avoiding nested locks.
-
"When would you use threads vs processes vs async?"
- Answer: Threads for I/O-bound with shared state. Processes for CPU-bound or isolation. Async for high-concurrency I/O without threads.
Red Flags to Avoid:
- Creating one thread per request (use thread pools)
- Ignoring synchronization needs
- Not understanding GIL limitations (Python)
- Overlooking deadlock possibilities
Related Topics
- Load Balancing (Step 6): Distributes load across processes/servers (threads are within process)
- Microservices (Step 8): Process-level distribution (threads are within process)
- Message Queues (Step 7): Alternative to threads for concurrency
- Caching (Step 4): Thread-safe caching is important
Visual Aids
Process vs Thread Memory Model
Process A: Process B:
āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā
ā Memory ā ā Memory ā
ā Space ā ā Space ā
ā ā ā ā
ā āāāāāāāāāāā ā ā āāāāāāāāāāā ā
ā ā Thread1 ā ā ā ā Thread1 ā ā
ā āāāāāāāāāāā ā ā āāāāāāāāāāā ā
ā āāāāāāāāāāā ā ā āāāāāāāāāāā ā
ā ā Thread2 ā ā ā ā Thread2 ā ā
ā āāāāāāāāāāā ā ā āāāāāāāāāāā ā
āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā
(Isolated) (Isolated)
Within Process A:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Shared Memory ā
ā (Code, Data, Heap) ā
ā ā
ā āāāāāāāāāāā āāāāāāāāāāā ā
ā ā Thread1 ā ā Thread2 ā ā
ā ā Stack ā ā Stack ā ā
ā āāāāāāāāāāā āāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Thread Pool Pattern
Request Queue
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Thread Pool ā
ā (Fixed Size) ā
ā ā
ā āāāāā āāāāā ā
ā ā T ā ā T ā ā
ā ā 1 ā ā 2 ā ā
ā āāāāā āāāāā ā
ā āāāāā āāāāā ā
ā ā T ā ā T ā ā
ā ā 3 ā ā 4 ā ā
ā āāāāā āāāāā ā
āāāāāāāāāāāāāāāāāāā
Back to: Step 1 Index | Main Index