0

In the below XML, I want to parse it and update the value of "alcohol" to "yes" for all the attributes where age>21. I'm having a problem with it being a node buried inside other nodes. Could someone help me understand how to handle this?

Here's the XML again..

<root xmlns="XYZ" usingPalette="">

<grandParent hostName="XYZ">
<parent>
        <child name="JohnsDad">
            <grandChildren name="John" sex="male" age="22" alcohol="no"/>
        </child>
        <child name="PaulasDad">
            <grandChildren name="Paula" sex="female" age="15" alcoho="no"/>
        </child>
</parent>
</grandParent>     
</root>   

I tried find all and find method using this document here (http://pymotw.com/2/xml/etree/ElementTree/parse.html) but it didn't find it. For example, following code returns no results

for node in tree.findall('.//grandParent'):
    print node

1 Answer 1

2
import xml.etree.ElementTree as ET

tree = ET.parse('data')
for node in tree.getiterator():
    if int(node.attrib.get('age', 0)) > 21:
        node.attrib['alcohol'] = 'yes'
root = tree.getroot()
ET.register_namespace("", "XYZ")
print(ET.tostring(root))

yields

<root xmlns="XYZ" usingPalette="">

<grandParent hostName="XYZ">
<parent>
        <child name="JohnsDad">
            <grandChildren age="22" alcohol="yes" name="John" sex="male" />
        </child>
        <child name="PaulasDad">
            <grandChildren age="15" alcoho="no" name="Paula" sex="female" />
        </child>
</parent>
</grandParent>     
</root>

By the way, since the XML uses the namespace "XYZ", you must specify the namespace in your XPath:

for node in tree.findall('.//{XYZ}grandParent'):
    print node

That will return the grandParent element, but since you want to inspect all subnodes, I think using getiterator is easier here.


To preserve comments while using xml.etree.ElementTree you could use the custom parser Fredrik Lundh shows here:

import xml.etree.ElementTree as ET


class PIParser(ET.XMLTreeBuilder):
    """
    http://effbot.org/zone/element-pi.htm
    """
    def __init__(self):
        ET.XMLTreeBuilder.__init__(self)
        # assumes ElementTree 1.2.X
        self._parser.CommentHandler = self.handle_comment
        self._parser.ProcessingInstructionHandler = self.handle_pi
        self._target.start("document", {})

    def close(self):
        self._target.end("document")
        return ET.XMLTreeBuilder.close(self)

    def handle_comment(self, data):
        self._target.start(ET.Comment, {})
        self._target.data(data)
        self._target.end(ET.Comment)

    def handle_pi(self, target, data):
        self._target.start(ET.PI, {})
        self._target.data(target + " " + data)
        self._target.end(ET.PI)


tree = ET.parse('data', PIParser())

Note that if you install lxml, you could instead use:

import lxml.etree as ET
parser = ET.XMLParser(remove_comments=False)
tree = etree.parse('data', parser=parser)
Sign up to request clarification or add additional context in comments.

5 Comments

that works unutbu. Although it stripes out any comments in the original file if I write the result back into this file. How do I retain the comments?
unutbu - in the first approach, how do I avoid wrapping the xml inside another document tag? I tried commenting the start & end tag code but didn't help.
Do you mean you want to print only the <grandParent> element?
I want to have the XML structure intact and have comments removed.
Doesn't Lunhd's PIParser, or the lxml solution already do that?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.