2

I have just started to dive into Python and XML and I am facing a problem of parsing a (possibly) non-standard XML (please correct me if I am wrong).

I want to parse the value of an Element by previously identifying that Element based on the value of its Attribute.

More in details: I have two elements 'Name' and I want to parse the value of the one having Attribute language == 'en-US'.

In my XML file, <'Name' language == 'en-US'> appears always immediately AFTER <'Name' language == 'es-ES'> and I am unable to get the value of the former (e.g. B), I can only get the value of the latter (e.g. A).

XML file:

<Eways>
    <Products>
        <Operator>
            <Name language="es-ES">A</Name>
            <Name language="en-US">B</Name>
        </Operator>
    </Products>
</Eways>

Python script:

import xml.etree.ElementTree as ET

tree = ET.parse('test.xml')
root = tree.getroot()

for prod in root.findall('Products'):

    for op in prod.findall('Operator'):
        print op.find('Name').text ### <- Testing, here I would expect to print both A and B, but only A is printed.

        for names in op.iter(tag='Name'):   ### Here I iterate over Element 'Name' trying to get the values anyways.
            l_name = names.get('language')

            if l_name == 'en-US':                     ### My objective is to print out the value of Element 'Name' where Attribute language == en-US.
                print 'OK en-US', names.find('Name')  ### I can not get the values (neither A nor B)
            else:
                print 'KO en-US', names.find('Name')
0

2 Answers 2

4

The element.find() method only ever finds the first matching element. If you expected to find both elements, you'd have to use element.findall().

You don't need to do so many loops here; just use an XPath expression:

for nametag in tree.findall('./Products/Operator/Name[@language]'):
    print nametag.attrib['language'], nametag.text

The XPath query is quite specific here; only Name elements with a language attribute inside an Operator inside a Products element are found.

The .text attribute here gives you the contents.

Demo:

>>> from xml.etree import ElementTree as ET
>>> tree = ET.fromstring('''\
... <Eways>
...     <Products>
...         <Operator>
...             <Name language="es-ES">A</Name>
...             <Name language="en-US">B</Name>
...         </Operator>
...     </Products>
... </Eways>
... ''')
>>> for nametag in tree.findall('./Products/Operator/Name[@language]'):
...     print nametag.attrib['language'], nametag.text
... 
es-ES A
en-US B

If you only ever want the <Name language="en-US"> tags, adjust the XPath query:

for nametag in tree.findall("./Products/Operator/Name[@language='en-US']"):
    print nametag.attrib['language'], nametag.text

where the [@language='en-US'] part limits the search to just those tags with a specific attribute value.

Sign up to request clarification or add additional context in comments.

Comments

1

The Name elements don't themselves contain further elements, so find gives None. Instead, you just want the text of the element:

>>> for p in tree.findall("Products"):
    for op in p.findall("Operator"):
        for n in op.findall("Name"):
            print n.get('language'), n.text


es-ES A
en-US B

1 Comment

Thanks to all. Now I got it ;-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.