Python regular expression cheat sheet

Peggy 144 Published: 11/02/2024

Python regular expression cheat sheet

I'd be delighted to provide you with a comprehensive Python regular expression (regex) cheat sheet!

Pattern Syntax

Python regex patterns are defined using the re module, which provides several functions for searching and manipulating text:

re.compile(pattern): Compiles a pattern into a regex object. re.search(pattern, string): Searches for the first occurrence of the pattern in the given string. re.match(pattern, string): Searches for the pattern at the beginning of the given string.

Basic Patterns

Literal: Matches any literal character (e.g., ., [, (, etc.).

Example: r'.`` matches a dot (.`)

Wildcard: Matches any single character (except newline).

Example: r'*' matches zero or more of the preceding character

Set Characters: Matches any character in the specified set.

Example: r'[abc]' matches either 'a', 'b', or 'c'

Range: Matches any character within a specified range.

Example: r'[0-9]' matches any digit (0-9)

Escape Characters: Escapes special characters, making them literal.

Example: r'(' matches the opening parenthesis (()

Special Patterns

Dot (.): Matches any single character (except newline).

Example: r'abc.def' matches 'abc.' followed by any single character

Star (*): Matches zero or more of the preceding pattern.

Example: r'a.*' matches 'a' followed by zero or more characters

Plus (+): Matches one or more of the preceding pattern.

Example: r'a+' matches 'a' repeated one or more times

Question Mark (?): Matches zero or one of the preceding pattern.

Example: r'a?' matches either an 'a' or no 'a'

Curly Braces ({}) and (^): {n,m}: Matches exactly n to m occurrences. {n,}: Matches at least n occurrences. ^{pattern}: Matches the start of a string.

Example: r'a{2,3}' matches 'a' repeated between 2 and 3 times

Alternation (|): Matches either pattern A or B.

Example: r'(abc|def)' matches either 'abc' or 'def'

Groups (()): Defines a capturing group.

Example: r'(abc)+(' matches 'abc' repeated one or more times

Lookarounds: Matches the pattern only if preceded/followed by another specific pattern. (?=pattern) : Positive lookahead. (?!pattern) : Negative lookahead. (?<=pattern) : Positive lookbehind. (?<!pattern) : Negative lookbehind.

Example: r'Hello (?=w+)' matches 'Hello' followed by one or more word characters

Flags

re.IGNORECASE: Makes the pattern case-insensitive.

Example: re.compile(r"hello", re.IGNORECASE).search("HELLO")

re.MULTILINE: Treats the string as multiple lines (when using '^' and '$').

Example: re.compile(r"^hello$", re.MULTILINE).search("hello worldnhello again")

re.DOTALL: Allows '.' to match newline characters.

Example: re.compile(r".*hello.*", re.DOTALL).search("Hello, World!nhello friend!")

Common Tasks

Validation: Verifies a string matches a pattern.

Example: re.match(r"[a-zA-Z]+", "Hello") returns the first match

Replacement: Replaces parts of a string matching a pattern.

Example: re.sub(r"old", "new", "old is gold") replaces 'old' with 'new'

Splitting: Splits a string into substrings based on a delimiter.

Example: re.split(r"W+", "Hello, World!")) splits the string by non-alphanumeric characters

Finding: Finds all occurrences of a pattern in a string.

Example: re.findall(r"[a-zA-Z]+", "Hello, World! 123 abc") returns all matches

Substitution: Substitutes parts of a string matching a pattern.

Example: re.subn(r"(d+)", lambda x: str(int(x.group(1)) + 1), "abc123def456")) replaces numbers with incremented values

Conclusion

This comprehensive Python regex cheat sheet covers the basics, special patterns, and flags. Familiarize yourself with these concepts to master the art of text manipulation in Python!

What is a re module in Python?

I'm happy to explain what a re module is in Python!

The re module is a built-in module in Python that provides support for regular expressions. Regular expressions are patterns used for string matching, searching, and replacing. They're incredibly powerful tools for processing text data.

In the re module, you can:

Search strings: Use the search() function to find the first occurrence of a pattern in a string. Match strings: Use the match() function to check if a string matches a specific pattern at the beginning of the string. Findall strings: Use the findall() function to return all non-overlapping matches of a pattern in a string as a list of tuples. Substitute strings: Use the sub() function to replace occurrences of a pattern with another string. Compile patterns: Use the compile() function to compile regular expressions into patterns that can be reused.

Regular expressions (regex) are composed of special characters, called "metacharacters," and literal characters. Metacharacters have specific meanings when used in regex:

., $, ^, |, (, ), [, ], {, }, (, +, ?, *, ~ (tilde), (, ), and are metacharacters. All other characters are considered literal.

Some common regex patterns include:

Matching sets: Use square brackets [ and ] to match a set of characters. For example, [a-zA-Z0-9_] matches any letter (both uppercase and lowercase), number, or underscore. Repeating: Use special characters like *, +, {m,n}, or ? to specify repetition patterns. For instance, .* matches any sequence of characters (including none). Grouping: Use parentheses ( ) to group parts of the pattern and treat them as a single unit. Character classes: Use special syntax like d (digit), w (word character), or s (whitespace) to match specific types of characters.

The re module also supports various flavors of regex, such as:

Permit mode: Allows the regular expression to match its own contents. SRE (Simple Regular Expressions): A simplified form of regex that is more readable but less powerful than full-blown regex.

In summary, the re module in Python provides a robust way to work with regular expressions, which are essential for text processing and pattern matching. You can use it for various tasks like searching, validating, extracting data from strings, and even performing data transformations.

Now that you know what the re module is, go ahead and explore its many features!