- Distinguish between concurrency and parallelism and identify which problems benefit from each
- Explain what the GIL is, why it exists, and how it affects threading in Python
- Use threading for I/O-bound tasks and multiprocessing for CPU-bound tasks
- Apply concurrent.futures to parallelise work with ThreadPoolExecutor and ProcessPoolExecutor
- Write async/await code using asyncio and explain when asynchronous programming is the right choice
Your program is waiting. It is waiting for a file to download, for a database query to return, for a web API to respond, for a user to click a button. While it waits, the CPU sits idle. If you have eight things to wait for, a simple sequential program waits for each one in turn, wasting most of its time doing nothing. Concurrency is the art of managing multiple tasks that overlap in time. Parallelism is the art of executing multiple tasks simultaneously on multiple CPU cores. Python supports both, but the distinction between them — and the constraints Python places on each — matters enormously. Get it right and your program finishes in seconds instead of minutes. Get it wrong and you add complexity for no benefit, or worse, introduce bugs that only appear under load.
Concurrency vs Parallelism
The classic analogy is a coffee shop. Concurrency is one barista juggling multiple orders — she starts the espresso machine, steams milk for another drink while the shot pulls, takes a new order while the milk heats. She is doing one physical thing at a time but making progress on several tasks by switching between them during idle moments. Parallelism is two baristas, each with their own espresso machine, making drinks simultaneously.
Concurrency is about structure — organising your program so that tasks can make progress when others are waiting. Parallelism is about execution — using multiple processors to do work at the same time. You can have concurrency without parallelism (one barista, many orders) and parallelism without concurrency (two baristas, each handling exactly one order from start to finish, never interleaving).
In practical terms: if your program is slow because it waits for I/O (network, disk, database), you need concurrency. If it is slow because it crunches numbers (compressing files, training a model, processing images), you need parallelism. Many real programs need both.
The GIL
The Global Interpreter Lock, or GIL, is Python's most infamous constraint. CPython — the standard Python interpreter — has a mutex that allows only one thread to execute Python bytecode at a time. Even if your machine has sixteen cores, a multi-threaded Python program will only ever run one thread's Python code at any given instant.
Why does it exist? CPython's memory management uses reference counting — every object has a counter tracking how many names point to it, and when the counter reaches zero, the object is freed. Without the GIL, every increment and decrement of every reference count would need its own lock, making single-threaded code significantly slower to protect against a race condition that most programs would never trigger.
What does this mean in practice? It means that threading in Python does not give you parallelism for CPU-bound work. If you spawn four threads that each compute prime numbers, they will not run four times faster — they will run at roughly the same speed as one thread, because only one can hold the GIL at a time.
But the GIL is released during I/O operations. When a thread calls socket.recv() or file.read() or time.sleep(), it drops the GIL, and another thread can run. This is why threading is still extremely useful in Python: for I/O-bound work — making HTTP requests, reading files, querying databases — threads provide genuine concurrency.
Threading
The threading module creates threads — lightweight units of execution within a single process that share memory:
import threading
import time
def download(url):
print(f"Starting {url}")
time.sleep(2) # Simulate network I/O
print(f"Finished {url}")
urls = ["page1.html", "page2.html", "page3.html", "page4.html"]
# Sequential: ~8 seconds
for url in urls:
download(url)
# Threaded: ~2 seconds (all downloads overlap)
threads = []
for url in urls:
t = threading.Thread(target=download, args=(url,))
threads.append(t)
t.start()
for t in threads:
t.join() # Wait for all threads to finish
The sequential version takes about eight seconds — two seconds per download, one at a time. The threaded version takes about two seconds — all four downloads happen concurrently, overlapping their wait times.
The join() method blocks until the thread finishes. Without it, the main thread might exit before the worker threads complete. Always join your threads unless you explicitly want them to run as daemon threads (background threads that are killed when the main thread exits).
Threads share memory, which is both their greatest advantage and their greatest danger. Two threads modifying the same data structure without coordination will produce race conditions — intermittent, hard-to-reproduce bugs where the result depends on the exact timing of thread execution. Use threading.Lock to protect shared state:
lock = threading.Lock()
counter = 0
def increment():
global counter
for _ in range(100_000):
with lock:
counter += 1
The multiprocessing Module
When your work is CPU-bound — mathematical computation, image processing, data transformation — threads cannot help because of the GIL. The solution is processes. Each process has its own Python interpreter, its own GIL, and its own memory space. Multiple processes can truly run in parallel on multiple CPU cores.
import multiprocessing
import math
def is_prime(n):
if n < 2:
return False
for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
numbers = [15485863, 15485867, 32452843, 32452867]
# Sequential
results = [is_prime(n) for n in numbers]
# Parallel using processes
with multiprocessing.Pool(processes=4) as pool:
results = pool.map(is_prime, numbers)
print(results) # Output: [True, True, True, True]
multiprocessing.Pool creates a pool of worker processes and distributes work across them. pool.map() works like the built-in map() but runs in parallel. On a four-core machine, this can be close to four times faster for CPU-bound work.
The trade-off is overhead. Spawning a process is more expensive than spawning a thread, and communication between processes requires serialisation (converting objects to bytes with pickle) rather than simple memory sharing. For tasks that take milliseconds, the overhead of creating processes may exceed the time saved. Multiprocessing shines when individual tasks take seconds or longer.
concurrent.futures: The Unified Interface
The concurrent.futures module provides a clean, unified API that works identically for threads and processes. Instead of choosing between threading.Thread and multiprocessing.Pool, you choose between ThreadPoolExecutor and ProcessPoolExecutor — and the rest of your code stays the same:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import time
def fetch_page(url):
time.sleep(1) # Simulate I/O
return f"Content of {url}"
urls = ["page1", "page2", "page3", "page4"]
# For I/O-bound work: use threads
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(fetch_page, urls))
print(results) # Completes in ~1 second
# For CPU-bound work: use processes
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(is_prime, numbers))
print(results) # Uses all 4 cores
The executor.submit() method gives you more control, returning a Future object that you can check, cancel, or wait on:
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=4) as executor:
futures = {executor.submit(fetch_page, url): url for url in urls}
for future in as_completed(futures):
url = futures[future]
result = future.result()
print(f"{url}: {result}")
as_completed() yields futures as they finish, so you can process results in the order they complete rather than the order you submitted them. This is ideal when some tasks finish faster than others and you want to start using results immediately.
async/await and asyncio
Threading and multiprocessing are both forms of preemptive concurrency — the operating system decides when to switch between threads or processes. Python offers a third model: cooperative concurrency with async and await, powered by the asyncio module.
In this model, a coroutine — defined with async def — explicitly yields control with the await keyword. An event loop manages all running coroutines and switches between them when one awaits:
import asyncio
async def fetch_data(name, delay):
print(f"Starting {name}")
await asyncio.sleep(delay) # Non-blocking sleep
print(f"Finished {name}")
return f"{name} data"
async def main():
# Run three coroutines concurrently
results = await asyncio.gather(
fetch_data("A", 2),
fetch_data("B", 1),
fetch_data("C", 3),
)
print(f"Results: {results}")
asyncio.run(main())
# Output: Starting A, Starting B, Starting C
# Finished B (after 1s)
# Finished A (after 2s)
# Finished C (after 3s)
# Results: ['A data', 'B data', 'C data']
The key advantage of asyncio over threading is that there are no race conditions from preemptive switching — your code runs uninterrupted between await points, so you can reason about state more easily. The key disadvantage is that every I/O operation must be await-compatible. You cannot mix blocking calls (like requests.get() or time.sleep()) with asyncio without wrapping them in an executor.
For making HTTP requests asynchronously, the aiohttp library is the standard choice:
import aiohttp
import asyncio
async def fetch_url(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
urls = [
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
]
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
print(f"Fetched {len(results)} pages")
asyncio.run(main())
# Three 1-second requests complete in ~1 second total
When to Use What
Choosing the right concurrency model depends on the nature of your workload:
I/O-bound, few tasks: threading or concurrent.futures.ThreadPoolExecutor. Simple, well-understood, works with existing synchronous libraries.
I/O-bound, many tasks (hundreds or thousands of connections): asyncio. Threads have overhead — each consumes memory for its stack. Asyncio handles thousands of concurrent connections with a single thread. Web servers and chat applications live here.
CPU-bound: multiprocessing or concurrent.futures.ProcessPoolExecutor. The only way to achieve true parallelism in CPython, because each process has its own GIL.
Mixed: combine them. A common pattern is an asyncio event loop that offloads CPU-heavy work to a ProcessPoolExecutor:
import asyncio
from concurrent.futures import ProcessPoolExecutor
def cpu_heavy(n):
return sum(i * i for i in range(n))
async def main():
loop = asyncio.get_event_loop()
with ProcessPoolExecutor() as pool:
result = await loop.run_in_executor(pool, cpu_heavy, 10_000_000)
print(f"Result: {result}")
asyncio.run(main())
The Future: Free-Threading
The GIL has been a source of frustration for decades, and Python is finally addressing it. PEP 703 introduced an experimental free-threaded build of CPython, first available in Python 3.13 as an opt-in option. In free-threaded Python, there is no GIL — threads can truly execute Python bytecode in parallel on multiple cores.
This does not mean you should rewrite everything with threads. Free-threading introduces real concurrency to Python's threading model, which means race conditions become a genuine risk where the GIL previously masked them. Code that was thread-safe only because the GIL prevented true parallelism may break under free-threading.
The transition will be gradual. Free-threading is experimental in Python 3.13 and 3.14, and the broader ecosystem — C extensions, popular libraries, package managers — needs time to adapt. But the direction is clear: Python is moving towards a world where threads can be used for parallelism just as naturally as processes, and the GIL will eventually become a footnote in Python's history.
Concurrency is hard — not because the tools are complicated, but because thinking about tasks that overlap in time is fundamentally more difficult than thinking about one thing at a time. The good news is that Python's tools are among the most approachable in any language. Start with concurrent.futures for the simplest cases, reach for asyncio when you need thousands of concurrent connections, and remember that the answer to "should I use threads or processes?" is almost always "it depends on whether you are waiting for I/O or crunching numbers." Get that distinction right and the rest follows naturally.