





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Regular expressions are text-matching tools that let you search for patterns in strings and manipulate or chop up strings based on those patterns. The basics of regular expressions, their use in python, and provides examples and resources for further learning.
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






(^) Power text-matching tools (^) Let you search strings for patterns; manipulate or chop up strings based on patterns (^) Patterns can be based on “normal” characters (e.g., the alphabet) (^) Can also include “special” symbols that give more expressive power (^) Match only numbers (^) Match only letters (^) Require that a string have zero or more (or one or more, or ...) occurrences of a given pattern before it counts as a match (^) Require that a string have a certain pattern at the beginning, or the end, of it in order for it to match
(^) Use the “re” module: (^) import re (^) Most important methods: (^) search(pattern, string) (^) Tests to see if the pattern matches anywhere in the target string; returns a MatchObject corresponding to the first one found (^) split(pattern, string) (^) Breaks apart the string by finding occurrences of the pattern (in other words, treating the pattern as the delimiter). Matched pattern elements are not returned in the strings
import re str = “Hello, Allan” match = re.search(“ll”, str) (^) match.start() - returns 2, the index of the start of where the pattern occurs (^) match.end() - returns 4, the index of the end of where the pattern occurs (^) To search for the next occurrence, one easy way is use the returned indices to create a substring of the original string that excludes the matched part: (^) substr = str[4:] (^) substr now refers to a string containing all the characters after index 4 (“o, Allan”) which can be searched again to find the next occurrence of the pattern
(^) Backslashes are frequently used in regular expression patterns (^) ... but the backslash character itself has special meaning in Python, so normally you’d have to put another backslash in front of it (^) Results in really unreadable patterns! (^) Alternative: use Python “raw” strings: (^) Preface string with lowercase r (^) Lets you get away without the extra backslash (^) Example: r’\w\w’
. (a single period) Matches any character except a newline ^ or \A Limits the match to occur at the beginning of the string $ or \Z Limits the match to occur at the end of the string