Updated
October 27, 2014.
A regular expression is an expression that describes a set of strings. One does not use regular expressions on integers unless they have been converted as a string type of data, usually using str(). Regular expressions, or regex, allow the use of escaped letters and special symbols to match a wide range of strings according to certain syntax rules. Python's special terms for regular expressions are summarized below with some examples:
- "." Any character except a newline.
- 'a' through 'Z'
- any numbers and symbols
- tab ('\t')
- "^" The start of the string. This is not the first character of the string but the invisible boundary which precedes the string. So, in the string 'cartwheel', the term '^' would match the location immediately before the 'c'.
- "$" The end of the string or just before the end of a line. This is not the last character of the string but the invisible boundary which follows the string. As with the preceding expression, the term '$' would match the location immediately following the 'l'.
- "*" 0 or more instances of the pattern
- 'cart.*' would match 'cartwheel', 'cartridge', 'cart567', and any other string that begins with the four characters 'cart'.
- '.*wheel' would match 'cartwheel', 'backwheel', 'frontwheel', '4-wheel', and any other string that ends with the five characters of 'wheel'.
- 'c.*l' would match 'cartwheel', 'control', 'cancel', 'c5a-f67l', and any other string that begins with 'c' and ends with 'l'.
- "+" 1 or more instances of the pattern. This is usually used in conjunction with square braces.
- 'c[art]+' matches 'c' followed by one or more instances of either 'a', 'r', or 't'.
- "?" 0 or 1 instances of the pattern.
- 'ca?t' matches 'cart', 'cast', 'cat', and any other string in which the first two places are 'ca', the last is 't' and the string is at least 3 and no more than 4 places long.
- "*?", "+?", "??" Match as few repetitions of the term preceding '?' as possible. Other forms of these operators try to match as many as possible.
- "{m}" Specifies how many instances of the regex should be matched
- "{m,n}" Specifies a range of the number of instances that should be matched
- "{m,n}?" Specifies a range of the number of instances that should be matched, matching as few as possible
- "\" Escapes special characters or signals a special sequence (like octal if the next character is 0).
- newline character: '\n'
- tab character: '\t'
- "[]" Indicates a set of characters for a single position in the regex
- 'd[aou]' matches 'da', 'do', and 'du'.
- '200[0-9]' matches all numbers from '2000' through '2009'. Note that this matches them as a string literal, not as an integer.
- "|" Matches either the value on the left of the pipe or the value on the right
- '[d|c]og' matches 'dog' or 'cog'.
- '[d|c][a|o][g|t]' matches any of the following: dog, dag, dot, dat, cog, cag, cot, and cat.
- "(...)" Indicates a grouping for the regex.
- (ca[rtp]) matches car, cat, and cap. The regex is also saved and can be accessed in other ways, saving one the effort of repeating it.
- "(?iLmsux)" Each letter defines the further meaning of the construction.
- "(?:...)" Non-grouping of a regex
- "(?P<name>...)" Give name 'name' to the regex for later usage
- "(?P=name)" Recalls the text matched by the regex named 'name'
- "(?#...)" A comment/remark. The parentheses and their contents are ignored.
- "(?=...)" Matches if the preceding part of the regex and the subsequent part both match
- "(?!... )" Matches expressions when the part of the regex preceding the parenthesis is not followed by the expression in parentheses
- "(?<=...)" Matches the expression to the right of the parentheses when it is preceded by the value of ...
- "(?<!...)" Matches the expression to the right of the parentheses when it is not preceded by the value of ...
- "\A" Matches the start of the string. This is similar to '^', above.
- "\b" Matches the empty string that forms the boundary at the beginning or end of a word.
- "\bwheel" will match 'wheel' but not 'chartwheel'.
- "\B" Matches the empty string that is not the beginning or end of a word
- "\d" Matches any decimal digit. This includes the numbers 0 through 9 or any number in the real set.
- "\D" Matches any non-decimal digit.
- "\s" Matches any whitespace character like a blank space, tab, and the like.
- "\S" Matches any non-whitespace charaacter. This is obviously the inverse of '\s', above.
- "\w" Matches any alphanumeric character and the underscore: a through z, A through Z, 0 through 9, and '_'.
- "\W" Matches any non-alphanumeric character. Examples for this include '&', '$', '@', etc.
- "\Z" Matches the end of the string. This is similar to '$', above.


