0
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><document DateTime="2017-06-23T04:27:08.592Z"><PeakInfo No="1" mz="505.2315648572003965" Intensity="4531.0000000000000000" Rel_Intensity="3.2737729673489735" Resolution="1879.5638812957554364" SNR="14.0278637770897561" Area="1348.1007591467391649" Rel_Area="2.3371194184605959" Index="238.9999999999976694"/><PeakInfo No="2" mz="522.1330917856538463" Intensity="3382.0000000000000000" Rel_Intensity="2.4435886505350317" Resolution="3502.9921209527169594" SNR="10.4705882352940982" Area="881.4468100654634100" Rel_Area="1.5281101521284057" Index="925.0000000000000000"/></document>

The above is a part of an xml file that I need to parse. I looked at some youtube videos of how to parse/extract xml files, and whatever they cover doesn't seem to apply to my xml files for some reason. I do know that these PeakInfo are the elements if I am not mistaken. However, I can't seem to be able to access the values for mz and Intensity values for each PeakInfo no.'s.

import xml.etree.ElementTree as ET
import os

file_name = 'E7.xml'
full_file = os.path.abspath(os.path.join('xmllist', file_name))

pl = ET.parse(full_file)

peakinfos = pl.findall('PeakInfo')

for p in peakinfos:
    mz = p.find('mz')
    print(mz)

The above is a code that I've written based on some youtube videos. Here, I tried to access the mz values from PeakInfo elements but to no avail. Is there anything I can do to achieve what I want?

Edit: print(pl) results in: xml.etree.ElementTree.ElementTree object

1 Answer 1

1
s = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
       <document DateTime="2017-06-23T04:27:08.592Z">
           <PeakInfo No="1" mz="505.2315648572003965"
                     Intensity="4531.0000000000000000"
                     Rel_Intensity="3.2737729673489735"
                     Resolution="1879.5638812957554364"
                     SNR="14.0278637770897561"
                     Area="1348.1007591467391649"
                     Rel_Area="2.3371194184605959"
                     Index="238.9999999999976694"/>
           <PeakInfo No="2" mz="522.1330917856538463"
                     Intensity="3382.0000000000000000"
                     Rel_Intensity="2.4435886505350317"
                     Resolution="3502.9921209527169594"
                     SNR="10.4705882352940982"
                     Area="881.4468100654634100"
                     Rel_Area="1.5281101521284057"
                     Index="925.0000000000000000"/>
       </document>'''

import xml.etree.ElementTree as ET

root = ET.fromstring(s)
peakinfos = root.findall('PeakInfo')

findall is looking for elements, you are trying to access element attributes.
Use attrib or get to access the values.

for p in peakinfos:
    print 'mz is ...', p.get('mz')
    print 'mz is ...', p.attrib['mz']
    for k,v in p.items():
        print '{}: {}'.format(k,v)
    print '--------------------------'
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. this seems to pull the values I want.
@BongKyoSeo, I can't help but wonder why you couldn't find the answer in the documentation. Did you consult the documentation?
This was my first time working with xml files, and the files online had different structures than mine. I was mostly trying to figure out how to do it using find/findall. didn't know the values were attributes.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.