0

If I want to replace a pattern in the following statement structure:

cat&345;
bat &#hut;

I want to replace elements starting from & and ending before (not including ;). What is the best way to do so?

3
  • are both of those one string Commented Jun 28, 2013 at 2:07
  • No, 2 separate strings Commented Jun 28, 2013 at 2:08
  • Good question, but please show also what you have been trying :) Commented Jun 28, 2013 at 2:09

4 Answers 4

1

Including or not including the & in the replacement?

>>> re.sub(r'&.*?(?=;)','REPL','cat&345;')           # including
'catREPL;'
>>> re.sub(r'(?<=&).*?(?=;)','REPL','bat &#hut;')    # not including
'bat &REPL;'

Explanation:

  • Although not required here, use a r'raw string' to prevent having to escape backslashes which often occur in regular expressions.
  • .*? is a "non-greedy" match of anything, which makes the match stop at the first semicolon.
  • (?=;) the match must be followed by a semicolon, but it is not included in the match.
  • (?<=&) the match must be preceded by an ampersand, but it is not included in the match.
Sign up to request clarification or add additional context in comments.

Comments

1

Here is a good regex
import re
result = re.sub("(?<=\\&).*(?=;)", replacementstr, searchText)

Basically this will put the replacement in between the & and the ;

Comments

0

Maybe go a different direction all together and use HTMLParser.unescape(). The unescape() method is undocumented, but it doesn't appear to be "internal" because it doesn't have a leading underscore.

Comments

0

You can use negated character classes to do this:

import re

st='''\
cat&345;
bat &#hut;'''

for line in st.splitlines():
    print line
    print re.sub(r'([^&]*)&[^;]*;',r'\1;',line)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.