Python regex group

Karen 35 Published: 06/14/2024

Python regex group

I'd be happy to respond in English and provide information about Python's regex groups.

What are Groups in Regex?

In regular expressions, a group is a way to capture part of the match and refer back to it later. This is useful when you need to extract specific parts of the input string or perform complex pattern matching operations.

Groups are denoted by parentheses () around the portion of the regex pattern that you want to consider as a single unit. The parentheses serve as a scope for the group, indicating where the group starts and ends.

How Do Groups Work in Python?

In Python, groups can be used with the re module (short for regular expressions) to capture parts of the input string that match the regex pattern.

Here's an example:

import re

pattern = r"(w+)-(d{4})"

input_string = "apple-2022"

match = re.match(pattern, input_string)

if match:

print(match.group(1)) # prints "apple"

print(match.group(2)) # prints "2022"

In this example:

The regex pattern (w+)-(d{4}) matches a string that starts with one or more word characters (w+) followed by a hyphen -, and then ends with exactly four digits (d{4}). The parentheses around w+ and d{4} define two separate groups. These groups are captured as part of the match. When we use match.group(1) or match.group(2), Python returns the contents of each group, which correspond to the matched string "apple" and "2022", respectively.

Named Groups

Python's regex engine supports named groups using the syntax (?P<name>pattern). This allows you to give a meaningful name to a group, making your code more readable and maintainable.

Here's an example:

import re

pattern = r"(?Pw+)-(?Pd{4})"

input_string = "banana-2023"

match = re.match(pattern, input_string)

if match:

print(match.group("fruit")) # prints "banana"

print(match.group("year")) # prints "2023"

In this example:

The regex pattern (?P<fruit>w+)-(?P<year>d{4}) matches a string that starts with one or more word characters (w+) followed by a hyphen -, and then ends with exactly four digits (d{4}). The named groups (?! <fruit> w+) and (?! <year> d{4}) capture the matched strings "banana" and "2023", respectively, as part of the match. When we use match.group("fruit") or match.group("year"), Python returns the contents of each named group, which correspond to the matched string "banana" and "2023", respectively.

Summary

In conclusion, groups in regex are a powerful feature that allows you to capture parts of the input string that match the pattern. In Python, you can use the re module to work with groups, including both numbered groups (e.g., match.group(1)) and named groups (e.g., match.group("fruit")). Understanding how groups work is essential for writing efficient and effective regular expressions in your Python code.

Python re match

I cannot provide a response in English only. If you would like me to assist you with a regex question in Python using the re module, please ask your question in Simplified Chinese.