Chapter Eight

Files and Input/Output

Learning Objectives
  1. Open, read, and write files using the with statement and context managers
  2. Distinguish between text and binary file modes and select the correct mode for a given task
  3. Construct and manipulate file paths using pathlib and explain why it is preferred over os.path
  4. Read and write CSV and JSON files using the standard library modules
  5. Describe the three standard streams and accept input from command-line arguments

Programs that only manipulate data in memory are toys. Real programs read configuration from disk, write logs, parse data files, generate reports, and exchange structured data with other systems. The moment your program needs to persist anything — or consume anything someone else produced — you are doing input/output. Python makes file I/O remarkably straightforward, but there are enough subtleties around encoding, modes, and resource management that it pays to learn the right patterns from the start.

Opening and Closing Files

The built-in open() function returns a file object. At its simplest:

f = open("notes.txt", "r")
content = f.read()
print(content)
f.close()

This works, but it has a problem. If an error occurs between open() and close(), the file is never closed — and unclosed files can leak resources, corrupt data, or lock other programs out. The solution is the with statement:

with open("notes.txt", "r") as f:
    content = f.read()
    print(content)
# File is automatically closed here, even if an error occurred

The with statement is a context manager. It guarantees that f.close() is called when the block exits, regardless of whether the exit is normal or caused by an exception. Always use with for file operations. There is no good reason not to.

Reading Files

Python provides several ways to read a file's contents:

with open("notes.txt") as f:
    # Read the entire file as a single string
    content = f.read()

with open("notes.txt") as f:
    # Read one line at a time
    first_line = f.readline()
    second_line = f.readline()

with open("notes.txt") as f:
    # Read all lines into a list
    lines = f.readlines()
    print(lines)    # Output: ['line one\n', 'line two\n', 'line three\n']

The most Pythonic approach is to iterate over the file object directly. This reads one line at a time and is memory-efficient even for enormous files:

with open("notes.txt") as f:
    for line in f:
        print(line.strip())    # strip() removes the trailing newline

This is the pattern you should reach for by default. It is clean, it is efficient, and it communicates intent clearly.

Writing Files

To write, open the file in write mode ("w") or append mode ("a"):

with open("output.txt", "w") as f:
    f.write("First line\n")
    f.write("Second line\n")

Write mode creates the file if it does not exist and truncates it (erases all content) if it does. This is the most common source of data loss with file I/O — opening a file in "w" mode when you meant to append.

Append mode adds to the end of an existing file:

with open("log.txt", "a") as f:
    f.write("New log entry\n")

The writelines() method writes a list of strings. It does not add newlines — you must include them yourself:

lines = ["one\n", "two\n", "three\n"]
with open("output.txt", "w") as f:
    f.writelines(lines)

For convenience, print() can write to a file via its file parameter:

with open("output.txt", "w") as f:
    print("Hello, file!", file=f)
    print("Second line.", file=f)

File Modes

The second argument to open() specifies the mode. The common modes are:

  • "r" — read (default). File must exist.
  • "w" — write. Creates or truncates.
  • "a" — append. Creates if needed, writes to end.
  • "x" — exclusive creation. Fails if the file already exists.
  • "rb", "wb" — read/write in binary mode. Data is handled as bytes, not str.

Text mode (the default) handles encoding automatically — it reads bytes from disk and decodes them into strings, using UTF-8 by default on most systems. Binary mode gives you raw bytes with no decoding:

# Reading an image file (binary)
with open("photo.jpg", "rb") as f:
    data = f.read()
    print(type(data))    # Output: <class 'bytes'>
    print(data[:10])     # Output: b'\xff\xd8\xff\xe0\x00\x10JFIF'

Use text mode for text files (.txt, .csv, .json, .py, .html). Use binary mode for everything else (images, audio, compressed archives, proprietary formats).

Paths with pathlib

File paths have traditionally been handled as strings in Python, manipulated with os.path:

import os.path
full = os.path.join("data", "2024", "results.csv")
print(os.path.basename(full))    # Output: results.csv

This works, but the pathlib module, introduced in Python 3.4, offers a far more elegant object-oriented approach:

from pathlib import Path

p = Path("data") / "2024" / "results.csv"
print(p.name)        # Output: results.csv
print(p.stem)        # Output: results
print(p.suffix)      # Output: .csv
print(p.parent)      # Output: data/2024
print(p.exists())    # Output: True or False

The / operator joins path segments — readable, cross-platform, and impossible to confuse with string concatenation. Path objects also provide methods for common operations:

p = Path("output")
p.mkdir(exist_ok=True)                # create directory
(p / "test.txt").write_text("hello")  # write a file
content = (p / "test.txt").read_text() # read it back

# List all .csv files in a directory
for csv_file in Path("data").glob("*.csv"):
    print(csv_file)

Prefer pathlib over os.path in new code. It is more readable, more powerful, and more Pythonic. The only reason to use os.path is compatibility with older codebases that expect string paths.

Working with CSV Files

CSV (comma-separated values) is the lingua franca of tabular data. Python's csv module handles the parsing quirks — quoted fields, embedded commas, different delimiters — so you do not have to:

import csv

# Writing CSV
with open("people.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Name", "Age", "City"])
    writer.writerow(["Alice", 30, "London"])
    writer.writerow(["Bob", 25, "Edinburgh"])

# Reading CSV
with open("people.csv", newline="") as f:
    reader = csv.reader(f)
    header = next(reader)
    for row in reader:
        print(f"{row[0]} is {row[1]} years old")

The newline="" argument prevents double newlines on Windows. For more readable code, csv.DictReader maps each row to a dictionary using the header as keys:

with open("people.csv", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(f"{row['Name']} lives in {row['City']}")

Working with JSON

JSON (JavaScript Object Notation) is the standard format for structured data exchange — APIs, configuration files, and data storage all use it. Python's json module converts between JSON text and Python objects:

import json

# Python dict to JSON string
data = {"name": "Alice", "age": 30, "languages": ["Python", "SQL"]}
json_str = json.dumps(data, indent=2)
print(json_str)

# JSON string to Python dict
parsed = json.loads(json_str)
print(parsed["name"])    # Output: Alice

The mapping is intuitive: JSON objects become Python dicts, JSON arrays become lists, JSON strings become str, numbers become int or float, true/false become True/False, and null becomes None.

To read and write JSON files directly:

# Write
with open("data.json", "w") as f:
    json.dump(data, f, indent=2)

# Read
with open("data.json") as f:
    loaded = json.load(f)

Note the naming: dumps/loads (with an "s") work with strings; dump/load (without) work with files. The mnemonic is not elegant, but it sticks once you see it.

Standard Streams

Every running program has three standard streams, inherited from Unix tradition:

  • sys.stdin — standard input. Where the program reads interactive input.
  • sys.stdout — standard output. Where print() sends its output.
  • sys.stderr — standard error. Where error messages and diagnostics go.
import sys

sys.stdout.write("This is normal output\n")
sys.stderr.write("This is an error message\n")

The distinction between stdout and stderr matters when your program's output is redirected. If someone runs python script.py > output.txt, only stdout goes to the file — stderr still appears on the terminal, which is exactly where you want error messages.

You can read from stdin interactively with input(), or process piped data:

import sys

# Reading piped input (e.g., echo "hello" | python script.py)
for line in sys.stdin:
    print(f"Got: {line.strip()}")

Command-Line Arguments

Programs often need to accept parameters when they are launched. The simplest approach is sys.argv, a list of strings passed on the command line:

import sys

# python greet.py Alice 30
print(sys.argv)    # Output: ['greet.py', 'Alice', '30']
name = sys.argv[1]
age = int(sys.argv[2])
print(f"Hello, {name}! You are {age} years old.")

sys.argv[0] is always the script name. The actual arguments start at index 1. For simple scripts with one or two arguments, sys.argv is perfectly adequate.

For anything more complex — optional flags, help messages, type validation — use the argparse module:

import argparse

parser = argparse.ArgumentParser(description="Greet someone.")
parser.add_argument("name", help="Name of the person")
parser.add_argument("--age", type=int, default=0, help="Age of the person")

args = parser.parse_args()
print(f"Hello, {args.name}!")
if args.age:
    print(f"You are {args.age} years old.")

Running python greet.py --help automatically generates a usage message. argparse handles error messages, type conversion, and default values — all the tedious work that sys.argv leaves to you.

Files are where programs meet the outside world. A program that cannot read input or write output is a program that exists only for its own amusement. The patterns in this chapter — with for safety, pathlib for paths, csv and json for structured data — are patterns you will use in nearly every Python project you ever write. Learn them well, and the boundary between your program and the rest of the system becomes a clean, reliable interface rather than a source of bugs.