How to get an element with xml.tree with python?

Question

I was following the documentation of how to use xml.etree to parse data from an xml file, but it seems that important information seem to be missing.

I am using the same example:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

and for each country I am trying to get the year associated to that country. I tried the following code:

import sys
import xml.etree.ElementTree as ET

tree = ET.parse(sys.argv[1])
root = tree.getroot()
for child in root:
    print(child.tag, child.attrib. child.get('year')) # or child['year'], or child.find('year').text

but none of these seem to work. How do I extract the value for year for each of the three countries?

Expected output:

country {'name': 'Liechtenstein'} 2008
country {'name': 'Singapore'} 2011
country {'name': 'Panama'} 2011

Addendum:

I found a way to get the 'year':

import sys
import xml.etree.ElementTree as ET

tree = ET.parse(sys.argv[1])
root = tree.getroot()
for child in root:
    for elem in list(child):
        if elem.tag == 'year':
            print(child.tag, child.attrib, elem.text)

Is there no simpler way?

Alexandra Dudkina · Accepted Answer · 2020-09-21 11:45:11Z

1

Which python version is used? For python 3.8 it would be:

def get_value(el):
    return el.text if el is not None else None

root = ET.fromstring(xml)

for country in root.findall('country'):
    year = get_value(country.find('year'))
    rank = get_value(country.find('rank'))
    neighbors = country.findall('neighbor')
    neighbor_names = [neighbor.get('name') for neighbor in neighbors]
    print(year, rank, neighbor_names)

answered Sep 21, 2020 at 11:45

Alexandra Dudkina

4,5123 gold badges18 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Asaf Amnony · Accepted Answer · 2020-09-21 11:43:55Z

1

You're in the right direction :) Try child.findall()

Some notes regarding your attempts:

child.get(attribute_name) returns the attribute named attribute_name of the element child
child[] expects an index (i.e. an integer)

answered Sep 21, 2020 at 11:43

Asaf Amnony

2051 silver badge9 bronze badges

2 Comments

Asaf Amnony Over a year ago

What output did you get when using child.find()? If only the first element, then the method works as expected in the documentation: find(match, namespaces=None) Finds the first subelement matching match. match may be a tag name or a path.

Alex Over a year ago

Sorry, seems to work. I was pretty sure that gave an error message before ...

Nitul · Accepted Answer · 2020-09-21 11:51:49Z

1

Have a look at the Element.iter() method.

The following code snippet will give you the desired output:

import sys
import xml.etree.ElementTree as ET

tree = ET.parse(sys.argv[1])
root = tree.getroot()

for child in root.iter('country'):
    for grandchild in child.iter('year'):
        print(child.attrib, grandchild.text)

answered Sep 21, 2020 at 11:51

Nitul

3793 silver badges8 bronze badges

Comments

balderman · Accepted Answer · 2020-09-21 11:57:04Z

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>'''

root = ET.fromstring(xml)
data = {c.attrib['name']: c.find('year').text for c in root.findall('.//country')}
print(data)

output

{'Liechtenstein': '2008', 'Singapore': '2011', 'Panama': '2011'}

Collectives™ on Stack Overflow

How to get an element with xml.tree with python?

4 Answers 4

Comments

2 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

Comments

Related