Chapter Thirteen

The Standard Library

Learning Objectives
  1. Explain what 'batteries included' means and locate documentation for any standard library module
  2. Use os, sys, and pathlib to inspect the environment, manipulate paths, and navigate the filesystem
  3. Apply datetime to parse, format, and perform arithmetic on dates and times
  4. Select appropriate modules from collections, itertools, and functools to simplify common programming tasks
  5. Use subprocess to run external commands and shutil to perform high-level file operations

Python ships with over three hundred modules. Not third-party packages you need to install — modules that are already there, sitting on your disk the moment you install the interpreter. Need to parse JSON? import json. Need to send an email? import smtplib. Need to compress a file? import gzip. This philosophy has a name: batteries included. The standard library is so comprehensive that many real-world programs never need a single external dependency. Knowing what is already available is one of the most valuable skills a Python programmer can develop, because the fastest code to write is code someone else already wrote and tested.

Exploring the Environment with os and sys

The os module is your interface to the operating system. It can read environment variables, query the current working directory, create directories, and more:

import os

print(os.getcwd())                   # Output: /home/user/project
print(os.environ.get("HOME"))        # Output: /home/user
os.makedirs("data/raw", exist_ok=True)  # creates nested dirs safely
print(os.listdir("."))               # list files in current directory

The sys module gives you access to the Python interpreter itself — command-line arguments, the module search path, and the platform:

import sys

print(sys.version)          # Output: 3.12.3 (main, Apr  9 2024, ...)
print(sys.platform)         # Output: linux (or darwin, or win32)
print(sys.argv)             # command-line arguments as a list
print(sys.path[:3])         # first three entries on the module search path

sys.argv is particularly useful for quick scripts that need to accept arguments. sys.argv[0] is the script name; sys.argv[1:] are the arguments the user passed. For anything more complex than a couple of flags, use the argparse module instead — it handles help text, type conversion, and validation automatically.

Pathlib for Modern Path Handling

String manipulation for file paths is error-prone and ugly. The pathlib module, introduced in Python 3.4, replaces all of that with Path objects that behave intuitively:

from pathlib import Path

p = Path("data") / "raw" / "measurements.csv"
print(p)              # Output: data/raw/measurements.csv
print(p.suffix)       # Output: .csv
print(p.stem)         # Output: measurements
print(p.parent)       # Output: data/raw
print(p.exists())     # Output: True or False

The / operator joins path components — no more fiddling with os.path.join(). Paths know their parts, their suffixes, and whether they exist. You can iterate over directories, read files, and glob for patterns:

for py_file in Path(".").rglob("*.py"):
    print(py_file)

content = Path("config.yaml").read_text()
Path("output.txt").write_text("done")

Use pathlib for all new code. The older os.path functions still work and appear in legacy code, but pathlib is cleaner, safer, and more Pythonic.

Dates and Times with datetime

Time is deceptively complicated — leap years, time zones, daylight saving, varying month lengths. The datetime module handles the fundamentals:

from datetime import datetime, date, timedelta

now = datetime.now()
print(now)                          # Output: 2024-07-15 14:30:00.123456

birthday = date(1990, 6, 15)
age_days = (date.today() - birthday).days
print(f"You are {age_days} days old")

deadline = datetime(2024, 12, 31, 23, 59)
remaining = deadline - datetime.now()
print(f"{remaining.days} days until deadline")

timedelta represents a duration — a difference between two points in time. You can add or subtract them:

from datetime import timedelta

one_week = timedelta(weeks=1)
tomorrow = date.today() + timedelta(days=1)

Formatting and parsing use strftime and strptime:

print(now.strftime("%d %B %Y"))      # Output: 15 July 2024
parsed = datetime.strptime("2024-01-15", "%Y-%m-%d")

The mnemonic: strftime formats (f for format), strptime parses (p for parse). For anything involving time zones, reach for the zoneinfo module (Python 3.9+) or the third-party pytz library.

Numbers: math, statistics, and random

The math module provides mathematical functions that go well beyond basic arithmetic:

import math

print(math.sqrt(144))       # Output: 12.0
print(math.log(100, 10))    # Output: 2.0
print(math.ceil(4.2))       # Output: 5
print(math.factorial(6))    # Output: 720
print(math.pi)              # Output: 3.141592653589793

The statistics module offers straightforward descriptive statistics:

import statistics

data = [4, 8, 6, 5, 3, 7, 9, 2]
print(statistics.mean(data))     # Output: 5.5
print(statistics.median(data))   # Output: 5.5
print(statistics.stdev(data))    # Output: 2.449...

The random module generates pseudo-random numbers. It is not suitable for cryptography (use secrets for that), but it is perfect for simulations, games, and sampling:

import random

print(random.randint(1, 6))            # roll a die
print(random.choice(["red", "blue"]))  # pick one
print(random.sample(range(100), 5))    # 5 unique numbers
random.shuffle(data)                   # shuffle in place

Power Tools: itertools, functools, and collections

Three modules deserve special mention for the elegance they bring to everyday code.

itertools provides building blocks for efficient iteration. chain flattens multiple iterables, product generates Cartesian products, and groupby clusters consecutive identical elements:

from itertools import chain, islice, count

# Combine multiple lists into one stream
combined = list(chain([1, 2], [3, 4], [5, 6]))
print(combined)   # Output: [1, 2, 3, 4, 5, 6]

# Take the first 5 items from an infinite counter
first_five = list(islice(count(10), 5))
print(first_five)  # Output: [10, 11, 12, 13, 14]

functools offers higher-order function utilities. lru_cache adds memoisation to any function, and partial freezes some arguments:

from functools import lru_cache, partial

@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(50))  # Output: 12586269025 (instant, thanks to caching)

int_from_binary = partial(int, base=2)
print(int_from_binary("1010"))  # Output: 10

collections provides specialised container types that solve problems the built-in dict and list do not:

from collections import Counter, defaultdict, deque

# Count occurrences
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
print(Counter(words))
# Output: Counter({'apple': 3, 'banana': 2, 'cherry': 1})

# Dictionary with default values
word_lengths = defaultdict(list)
for w in words:
    word_lengths[len(w)].append(w)
print(dict(word_lengths))
# Output: {5: ['apple', 'apple', 'apple'], 6: ['banana', 'cherry', 'banana']}

# Double-ended queue — fast appends and pops from both ends
recent = deque(maxlen=3)
for item in ["a", "b", "c", "d", "e"]:
    recent.append(item)
print(list(recent))  # Output: ['c', 'd', 'e']

Counter counts things. defaultdict never raises KeyError. deque is a fixed-size sliding window. Learn these three and you will reach for them constantly.

Running External Commands with subprocess

The subprocess module lets Python run other programs. The recommended entry point is subprocess.run():

import subprocess

result = subprocess.run(["ls", "-la"], capture_output=True, text=True)
print(result.stdout)
print(result.returncode)   # 0 means success

Always pass the command as a list of strings — never as a single string with shell=True, which opens the door to shell injection vulnerabilities. If you need to check that the command succeeded, use check=True:

subprocess.run(["python3", "my_script.py"], check=True)
# Raises CalledProcessError if the return code is non-zero

For legacy code, you may encounter os.system() or os.popen(). Both are obsolete. Use subprocess for all new code.

Copying and Moving Files with shutil

The shutil module handles high-level file operations — the things os can do in theory but makes painful in practice:

import shutil

shutil.copy("report.txt", "backup/report.txt")
shutil.copytree("project/", "project_backup/")
shutil.move("old_name.py", "new_name.py")
shutil.rmtree("temp_directory/")   # delete directory and all contents

shutil.copytree copies an entire directory tree. shutil.rmtree deletes one — use it with caution, because there is no undo. shutil.disk_usage("/") returns total, used, and free space on a filesystem.

Finding Your Way Around

No one memorises three hundred modules. The skill is knowing how to explore. The dir() function lists everything in a module. The help() function prints documentation. Tab completion in the interactive interpreter does both:

import json
print(dir(json))      # list all names in the json module
help(json.dumps)      # read the documentation for json.dumps

The official Python documentation at docs.python.org is excellent — arguably the best standard library documentation of any programming language. Each module has a page with explanations, examples, and cross-references. When you think "I wish Python had a function that...", search the standard library first. More often than not, it does.

The standard library is Python's secret weapon. Frameworks come and go, third-party packages rise and fall in popularity, but the standard library is always there — stable, well-tested, and available on every machine that has Python installed. The more of it you know, the less code you have to write, install, and maintain. That is not laziness. That is engineering.