REGULAR EXPRESSIONS PYTHON
To use regular expressions in Python, you can use the re
module, which provides several functions to work with regular expressions, such as search
, findall
, sub
, split
, compile
, etc.
Here is an example of using the re.search
function to match a pattern in a string:
import re
text = "The quick brown fox jumps over the lazy dog."
pattern = "fox"
match = re.search(pattern, text)
if match:
print("Match found.")
else:
print("Match not found.")
In this example, the re.search
function returns a match object if the pattern fox
is found in the string text
. The match object contains information about the match, such as the starting and ending position of the match.
Another useful function in the re
module is re.findall
, which returns a list of all non-overlapping matches in the string:
import re
text = "The quick brown fox jumps over the lazy dog."
pattern = "o"
matches = re.findall(pattern, text)
print(matches)
This will output: ['o', 'o', 'o', 'o']
The re.sub
function can be used to replace all occurrences of a pattern in a string with a replacement string:
import re
text = "The quick brown fox jumps over the lazy dog."
pattern = "fox"
replacement = "cat"
new_text = re.sub(pattern, replacement, text)
print(new_text)
This will output: The quick brown cat jumps over the lazy dog.
In addition to the functions mentioned above, the re
module also provides several options that can be used to modify the behavior of regular expressions, such as the re.IGNORECASE
option, which makes the regular expression case-insensitive.
Here is an example that uses the re.IGNORECASE
option:
import re
text = "The quick brown fox jumps over the lazy dog"
# Without re.IGNORECASE
pattern = "The"
match = re.search(pattern, text)
if match:
print("Matched with pattern '{}'".format(pattern))
else:
print("No match with pattern '{}'".format(pattern))
# With re.IGNORECASE
pattern = "the"
match = re.search(pattern, text, re.IGNORECASE)
if match:
print("Matched with pattern '{}'".format(pattern))
else:
print("No match with pattern '{}'".format(pattern))
This will output:
Matched with pattern 'The'
Matched with pattern 'the'
As we can see, without re.IGNORECASE
, the pattern "The"
was matched successfully, but the pattern "the"
was not. However, when re.IGNORECASE
was used, both patterns were matched successfully.
This option can be especially useful when working with text that is not consistently formatted, such as user-generated content or data from different sources.