Parsing a (possibly non-standard) XML with Python

Question

I have just started to dive into Python and XML and I am facing a problem of parsing a (possibly) non-standard XML (please correct me if I am wrong).

I want to parse the value of an Element by previously identifying that Element based on the value of its Attribute.

More in details: I have two elements 'Name' and I want to parse the value of the one having Attribute language == 'en-US'.

In my XML file, <'Name' language == 'en-US'> appears always immediately AFTER <'Name' language == 'es-ES'> and I am unable to get the value of the former (e.g. B), I can only get the value of the latter (e.g. A).

XML file:

<Eways>
    <Products>
        <Operator>
            <Name language="es-ES">A</Name>
            <Name language="en-US">B</Name>
        </Operator>
    </Products>
</Eways>

Python script:

import xml.etree.ElementTree as ET

tree = ET.parse('test.xml')
root = tree.getroot()

for prod in root.findall('Products'):

    for op in prod.findall('Operator'):
        print op.find('Name').text ### <- Testing, here I would expect to print both A and B, but only A is printed.

        for names in op.iter(tag='Name'):   ### Here I iterate over Element 'Name' trying to get the values anyways.
            l_name = names.get('language')

            if l_name == 'en-US':                     ### My objective is to print out the value of Element 'Name' where Attribute language == en-US.
                print 'OK en-US', names.find('Name')  ### I can not get the values (neither A nor B)
            else:
                print 'KO en-US', names.find('Name')

Martijn Pieters · Accepted Answer · 2014-08-07 10:26:53Z

The element.find() method only ever finds the first matching element. If you expected to find both elements, you'd have to use element.findall().

You don't need to do so many loops here; just use an XPath expression:

for nametag in tree.findall('./Products/Operator/Name[@language]'):
    print nametag.attrib['language'], nametag.text

The XPath query is quite specific here; only Name elements with a language attribute inside an Operator inside a Products element are found.

The .text attribute here gives you the contents.

Demo:

>>> from xml.etree import ElementTree as ET
>>> tree = ET.fromstring('''\
... <Eways>
...     <Products>
...         <Operator>
...             <Name language="es-ES">A</Name>
...             <Name language="en-US">B</Name>
...         </Operator>
...     </Products>
... </Eways>
... ''')
>>> for nametag in tree.findall('./Products/Operator/Name[@language]'):
...     print nametag.attrib['language'], nametag.text
... 
es-ES A
en-US B

If you only ever want the <Name language="en-US"> tags, adjust the XPath query:

for nametag in tree.findall("./Products/Operator/Name[@language='en-US']"):
    print nametag.attrib['language'], nametag.text

where the [@language='en-US'] part limits the search to just those tags with a specific attribute value.

jonrsharpe · Accepted Answer · 2014-08-07 10:06:33Z

1

The Name elements don't themselves contain further elements, so find gives None. Instead, you just want the text of the element:

>>> for p in tree.findall("Products"):
    for op in p.findall("Operator"):
        for n in op.findall("Name"):
            print n.get('language'), n.text


es-ES A
en-US B

answered Aug 7, 2014 at 10:06

jonrsharpe

123k30 gold badges275 silver badges487 bronze badges

1 Comment

Manuel Over a year ago

Thanks to all. Now I got it ;-)

Collectives™ on Stack Overflow

Parsing a (possibly non-standard) XML with Python

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related