Glossary

Regular Expression

Also known as: regex, regexp

A regular expression (regex) is a compact pattern language for matching sequences of characters in text. Python's re module provides functions for regex operations: re.search() (find first match), re.match() (match at start), re.findall() (find all matches), re.sub() (search and replace), and re.split() (split by pattern). Regular expressions are compiled into an internal format for efficient reuse: pattern = re.compile(r'\d+').

Regex syntax includes literal characters, character classes ([a-z], \d, \w, \s), quantifiers (*, +, ?, {n,m}), anchors (^, $, \b), alternation (|), and grouping with parentheses. Python extends basic regex with named groups ((?P<name>...)) , non-capturing groups ((?:...)), lookahead ((?=...)), lookbehind ((?<=...)), and flags like re.IGNORECASE and re.MULTILINE.

Regular expressions are powerful but can be difficult to read and maintain. The recommendation is to use r"..." (raw strings) for regex patterns to avoid backslash escaping issues, to use re.VERBOSE mode for complex patterns (which allows whitespace and comments), and to prefer simpler string methods (str.startswith(), str.endswith(), in) when they suffice. For very complex text parsing, dedicated parsers are usually more appropriate than regex.

Related terms: String

Discussed in:

Also defined in: Textbook of Linux

This site is currently in Beta. Please email Chris Paton (cpaton@gmail.com) with any suggestions, questions or comments.