0

I am trying to find a link which contains http or // or \ and surround with a href tag once its found but when reading line by line from the data read from xml..I see the output is split with each letter..please see the input and output below..can anyone suggest where am i going wrong?

 INput:-http://pastebin.com/p9H8GQt4
 Currentoutput:- http://pastebin.com/7428jK63

sanity_results = sanity_results.replace('\n','<br>\n')
return sanity_results

def main ():
resultslis=[]
xmlfile = open('results.xml','r')
contents = xmlfile.read()
testresults=getsanityresults(contents)
#print testresults
for line in testresults:
    #print line
    line = line.strip()
    #print line
    line = re.sub(r'(http://[^\s]+|//[^\s]+|\\\\[^\s]+)', r'<a href="\1">\1</a>', line)
    print line       
    resultslis.append(line)
print resultslis

if __name__ == '__main__':
main()
1
  • why don't you use an xml parser? Commented Nov 20, 2012 at 16:55

3 Answers 3

4

You want to use an XML parser like

  • elementree
  • lxml
  • minidom

etc. for parsing any kind of XML file. Parsing XML yourself - especially line-by-line is error-prone. Especially the usage of regular expressions is broken-by-design. Don't do that.

Be smart and use an XML parser instead.

Sign up to request clarification or add additional context in comments.

1 Comment

The question was about reading line by line.. If you don't know the answer please dont waste people's time by not answering..
2

You are iterating over a string, not over the file.

If you want to iterate over the lines in a string use str.splitlines:

>>> text ='''first
... second
... '''
>>> for line in text.splitlines():
...     print(line)
... 
first
second
>>> for char in text:
...     print(char)
... 
f
i
r
s
t


s
e
c
o
n
d

Anyway I'd advice you to use an XML parser. The stdlib already provides one and there are plenty of additional libraries around.

Comments

0

The problem is the line:

contents = xmlfile.read()

which returns a string; therefore the iteration operates on characters. Replace read() by readlines() and you have the lines intended.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.