4

I have this xml file that has a lot of chemical groups and their properties. Here is a slice of the file:

 <groups>
  <group name='CH3'>
   <mw>15.03502</mw>
   <heatCapacity>
    <a>19.5</a>
   </heatCapacity>
  </group>
  <group name='CH2'>
   <mw>14.02708</mw>
   <heatCapacity>
    <a>-0.909</a>
   </heatCapacity>
  </group>
  <group name='COOH'>
   <mw>45.02</mw>
   <heatCapacity>
    <a>-24.1</a>
   </heatCapacity>
   </heatCapacity>
  </group>
  <group name='OH'>
   <mw>17.0073</mw>
   <heatCapacity>
    <a>25.7</a>
   </heatCapacity>
  </group>
<\groups>

In my python code that parses this file using ElementTree I have a list blocks=['CH3','CH2'] and I want to use this to find the two groups. I tried the following:

import elementtree.ElementTree as ET
document = ET.parse( 'groups.xml' )
blocks=['CH3','CH2']
for item in blocks:
   group1 = document.find(item)
   print group1

And all I get is 'None'. Can you please help me?

Many thanks

3
  • 2
    Perhaps it is worth to learn xpath... Commented Jul 29, 2014 at 16:06
  • in lxml you can just do doc.xpath("//group[starts-with(@name,'CH')]"), but I don't think elementtree has proper xpath support to handle that expression. Commented Jul 29, 2014 at 16:28
  • 1
    Is that your actual code? Because I'm used to seeing import xml.etree.ElementTree as ET as the import statement. Commented Jul 29, 2014 at 16:43

2 Answers 2

3

You can find an element's attributes via its .get() method. Here is one way to look there:

import xml.etree.ElementTree as ET
document = ET.parse( 'groups.xml' )
blocks=['CH3','CH2']
for group in document.getroot():
   if group.get('name') in blocks:
     print group

If you need access to the data through arbitrary selection criteria, you can create your own dictionary:

import xml.etree.ElementTree as ET

# Parse
document = ET.parse( 'groups.xml' )

# Add a dictionary so that <group>s
# are easy to find by name
groups = {}
for group in document.getroot():
   groups[group.get('name')] = group

# Look up our compounds in the dictionary
blocks=['CH3', 'CH2']
for item in blocks:
    group = groups[item]
    mw = group.find('mw').text
    print item, mw
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Rob, thanks for your reply. It is essential for me to iterate on my list because I want to get groups in the correct order.
Use a dictionary to store the group data in a convenient fashion. See my recent edit.
2

Try this:

for block in blocks:
    group = document.find('./group[@name="{}"]'.format(block))
    if group:
        xml.etree.ElementTree.dump(group)
    else:
        print "Group {} not found.".format(group)

3 Comments

Hi Paulo thanks for your reply. I am using Python 2.4 which does not support this. How can I achieve this in 2.4?
replace './group[@name="{}"]'.format(block) by './group[@name="%s"]' % block
Sorry, I don't have 2.4 around in order to reproduce the problem. Update the question with the error message you got and I will be glad to help.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.