If I want to replace a pattern in the following statement structure:
cat&345;
bat &#hut;
I want to replace elements starting from & and ending before (not including ;). What is the best way to do so?
If I want to replace a pattern in the following statement structure:
cat&345;
bat &#hut;
I want to replace elements starting from & and ending before (not including ;). What is the best way to do so?
Including or not including the & in the replacement?
>>> re.sub(r'&.*?(?=;)','REPL','cat&345;') # including
'catREPL;'
>>> re.sub(r'(?<=&).*?(?=;)','REPL','bat &#hut;') # not including
'bat &REPL;'
r'raw string' to prevent having to escape backslashes which often occur in regular expressions..*? is a "non-greedy" match of anything, which makes the match stop at the first semicolon.(?=;) the match must be followed by a semicolon, but it is not included in the match.(?<=&) the match must be preceded by an ampersand, but it is not included in the match.Maybe go a different direction all together and use HTMLParser.unescape(). The unescape() method is undocumented, but it doesn't appear to be "internal" because it doesn't have a leading underscore.