3

I am trying to read a file and replace every "a ... a" by a '\footnotemark'

with open('myfile', 'r') as myfile:
   data = myfile.read()
   data = re.sub('<a.+?</a>', '\footnotemark', data)

Somehow Python always makes '\footnotemark' to '\x0cootnotemark' ('\f' to '\x0c'). I tried so far

  • Escaping: '{2 Backslashes}footnotemark'
  • raw String: r'\footnotemark' or r'"\footnotemark"'

None of these worked

Example input:

foo<a href="anything">asdasd</a> bar

Example output:

foo\footnotemark bar
8
  • 6
    r'\\footnotemark' Commented Feb 10, 2016 at 11:31
  • This gives me \\footnotemark Commented Feb 10, 2016 at 11:32
  • post an example along with expected output. BTW why are you trying to replace a html tag using regex? Commented Feb 10, 2016 at 11:33
  • 2
    above comment is correct. Python interpreter should show double \\ because of repr. Commented Feb 10, 2016 at 11:42
  • 2
    @LeoCHan: No, that doesn't work properly. Avinash Raj's original comment is correct: r'\\footnotemark' is the required string; alternatively: '\\\\footnotemark'. That's because 2 levels of escaping are required, one level for Python itself, one level for the regex syntax. FWIW, \f is a formfeed, i.e., a page-break control character. Commented Feb 10, 2016 at 12:39

1 Answer 1

3

Assuming Python2 since You haven't mentioned anything about version

#/usr/bin/python

import re

# myfile is saved with utf-8 encoding
with open('myfile', 'r') as myfile:

    text = myfile.read()
    print text
    data = re.sub('<a.+?</a>', r'\\footnotemark', text)

print data

outputs

foo<a href="anything">asdasd</a> bar
foo\footnotemark bar
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.