accessing xml files within a folder with unusual xml structure python

Question

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><document DateTime="2017-06-23T04:27:08.592Z"><PeakInfo No="1" mz="505.2315648572003965" Intensity="4531.0000000000000000" Rel_Intensity="3.2737729673489735" Resolution="1879.5638812957554364" SNR="14.0278637770897561" Area="1348.1007591467391649" Rel_Area="2.3371194184605959" Index="238.9999999999976694"/><PeakInfo No="2" mz="522.1330917856538463" Intensity="3382.0000000000000000" Rel_Intensity="2.4435886505350317" Resolution="3502.9921209527169594" SNR="10.4705882352940982" Area="881.4468100654634100" Rel_Area="1.5281101521284057" Index="925.0000000000000000"/></document>

The above is a part of an xml file that I need to parse. I looked at some youtube videos of how to parse/extract xml files, and whatever they cover doesn't seem to apply to my xml files for some reason. I do know that these PeakInfo are the elements if I am not mistaken. However, I can't seem to be able to access the values for mz and Intensity values for each PeakInfo no.'s.

import xml.etree.ElementTree as ET
import os

file_name = 'E7.xml'
full_file = os.path.abspath(os.path.join('xmllist', file_name))

pl = ET.parse(full_file)

peakinfos = pl.findall('PeakInfo')

for p in peakinfos:
    mz = p.find('mz')
    print(mz)

The above is a code that I've written based on some youtube videos. Here, I tried to access the mz values from PeakInfo elements but to no avail. Is there anything I can do to achieve what I want?

Edit: print(pl) results in: xml.etree.ElementTree.ElementTree object

wwii · Accepted Answer · 2017-06-24 20:18:56Z

s = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
       <document DateTime="2017-06-23T04:27:08.592Z">
           <PeakInfo No="1" mz="505.2315648572003965"
                     Intensity="4531.0000000000000000"
                     Rel_Intensity="3.2737729673489735"
                     Resolution="1879.5638812957554364"
                     SNR="14.0278637770897561"
                     Area="1348.1007591467391649"
                     Rel_Area="2.3371194184605959"
                     Index="238.9999999999976694"/>
           <PeakInfo No="2" mz="522.1330917856538463"
                     Intensity="3382.0000000000000000"
                     Rel_Intensity="2.4435886505350317"
                     Resolution="3502.9921209527169594"
                     SNR="10.4705882352940982"
                     Area="881.4468100654634100"
                     Rel_Area="1.5281101521284057"
                     Index="925.0000000000000000"/>
       </document>'''

import xml.etree.ElementTree as ET

root = ET.fromstring(s)
peakinfos = root.findall('PeakInfo')

findall is looking for elements, you are trying to access element attributes.
Use attrib or get to access the values.

for p in peakinfos:
    print 'mz is ...', p.get('mz')
    print 'mz is ...', p.attrib['mz']
    for k,v in p.items():
        print '{}: {}'.format(k,v)
    print '--------------------------'

@BongKyoSeo, I can't help but wonder why you couldn't find the answer in the documentation. Did you consult the documentation?
This was my first time working with xml files, and the files online had different structures than mine. I was mostly trying to figure out how to do it using find/findall. didn't know the values were attributes.

Collectives™ on Stack Overflow

accessing xml files within a folder with unusual xml structure python

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related