Reading line by line the data from an XML file

Question

I am trying to find a link which contains http or // or \ and surround with a href tag once its found but when reading line by line from the data read from xml..I see the output is split with each letter..please see the input and output below..can anyone suggest where am i going wrong?

 INput:-http://pastebin.com/p9H8GQt4
 Currentoutput:- http://pastebin.com/7428jK63

sanity_results = sanity_results.replace('\n','<br>\n')
return sanity_results

def main ():
resultslis=[]
xmlfile = open('results.xml','r')
contents = xmlfile.read()
testresults=getsanityresults(contents)
#print testresults
for line in testresults:
    #print line
    line = line.strip()
    #print line
    line = re.sub(r'(http://[^\s]+|//[^\s]+|\\\\[^\s]+)', r'<a href="\1">\1</a>', line)
    print line       
    resultslis.append(line)
print resultslis

if __name__ == '__main__':
main()

why don't you use an xml parser?

mata
– mata

2012-11-20 16:55:37 +00:00
Commented Nov 20, 2012 at 16:55 — mata
– mata, Commented Nov 20, 2012 at 16:55

user2665694user2665694 · Accepted Answer · 2012-11-20 17:00:59Z

4

You want to use an XML parser like

elementree
lxml
minidom

etc. for parsing any kind of XML file. Parsing XML yourself - especially line-by-line is error-prone. Especially the usage of regular expressions is broken-by-design. Don't do that.

Be smart and use an XML parser instead.

answered Nov 20, 2012 at 17:00

user2665694

Sign up to request clarification or add additional context in comments.

1 Comment

QT-1 Over a year ago

The question was about reading line by line.. If you don't know the answer please dont waste people's time by not answering..

Bakuriu · Accepted Answer · 2012-11-20 17:02:15Z

2

You are iterating over a string, not over the file.

If you want to iterate over the lines in a string use str.splitlines:

>>> text ='''first
... second
... '''
>>> for line in text.splitlines():
...     print(line)
... 
first
second
>>> for char in text:
...     print(char)
... 
f
i
r
s
t


s
e
c
o
n
d

Anyway I'd advice you to use an XML parser. The stdlib already provides one and there are plenty of additional libraries around.

answered Nov 20, 2012 at 17:02

Bakuriu

103k23 gold badges206 silver badges236 bronze badges

Comments

guidot · Accepted Answer · 2012-11-20 17:04:28Z

0

The problem is the line:

contents = xmlfile.read()

which returns a string; therefore the iteration operates on characters. Replace read() by readlines() and you have the lines intended.

answered Nov 20, 2012 at 17:04

guidot

5,3622 gold badges30 silver badges41 bronze badges

Collectives™ on Stack Overflow

Reading line by line the data from an XML file

3 Answers 3

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Related