Regular Expressions: Powerful Text-Matching Tools with Special Symbols in Python, Study notes of Computer Science

Regular expressions are text-matching tools that let you search for patterns in strings and manipulate or chop up strings based on those patterns. The basics of regular expressions, their use in python, and provides examples and resources for further learning.

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-n0e
koofers-user-n0e 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Using Regular Expressions
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Regular Expressions: Powerful Text-Matching Tools with Special Symbols in Python and more Study notes Computer Science in PDF only on Docsity!

Using Regular Expressions

What are “Regular Expressions?”

 (^) Power text-matching tools  (^) Let you search strings for patterns; manipulate or chop up strings based on patterns  (^) Patterns can be based on “normal” characters (e.g., the alphabet)  (^) Can also include “special” symbols that give more expressive power  (^) Match only numbers  (^) Match only letters  (^) Require that a string have zero or more (or one or more, or ...) occurrences of a given pattern before it counts as a match  (^) Require that a string have a certain pattern at the beginning, or the end, of it in order for it to match

Regular Expressions in Python

 (^) Use the “re” module:  (^) import re  (^) Most important methods:  (^) search(pattern, string)  (^) Tests to see if the pattern matches anywhere in the target string; returns a MatchObject corresponding to the first one found  (^) split(pattern, string)  (^) Breaks apart the string by finding occurrences of the pattern (in other words, treating the pattern as the delimiter). Matched pattern elements are not returned in the strings

Examples

import re str = “Hello, Allan” match = re.search(“ll”, str)  (^) match.start() - returns 2, the index of the start of where the pattern occurs  (^) match.end() - returns 4, the index of the end of where the pattern occurs  (^) To search for the next occurrence, one easy way is use the returned indices to create a substring of the original string that excludes the matched part:  (^) substr = str[4:]  (^) substr now refers to a string containing all the characters after index 4 (“o, Allan”) which can be searched again to find the next occurrence of the pattern

Special Characters

 (^) Backslashes are frequently used in regular expression patterns  (^) ... but the backslash character itself has special meaning in Python, so normally you’d have to put another backslash in front of it  (^) Results in really unreadable patterns!  (^) Alternative: use Python “raw” strings:  (^) Preface string with lowercase r  (^) Lets you get away without the extra backslash  (^) Example: r’\w\w’

Special Characters

. (a single period) Matches any character except a newline ^ or \A Limits the match to occur at the beginning of the string $ or \Z Limits the match to occur at the end of the string

  • (asterisk) Matches zero or more of the preceding character. Example: s* means zero or more of the letter “s”
  • (plus) Matches one or more of the preceding character [ ] Defines a character set. For example, to match against any of the vowels, use [aeiou]. To match against any number of numerals, use [0123456789-* \s Matches any whitespace character (space, tab, newline) \n Matches newline \w Matches any alphabetic or numeric character. Equivalent to [a-zA-Z0-9]