updating XML attribute value in python

Question

In the below XML, I want to parse it and update the value of "alcohol" to "yes" for all the attributes where age>21. I'm having a problem with it being a node buried inside other nodes. Could someone help me understand how to handle this?

Here's the XML again..

<root xmlns="XYZ" usingPalette="">

<grandParent hostName="XYZ">
<parent>
        <child name="JohnsDad">
            <grandChildren name="John" sex="male" age="22" alcohol="no"/>
        </child>
        <child name="PaulasDad">
            <grandChildren name="Paula" sex="female" age="15" alcoho="no"/>
        </child>
</parent>
</grandParent>     
</root>

I tried find all and find method using this document here (http://pymotw.com/2/xml/etree/ElementTree/parse.html) but it didn't find it. For example, following code returns no results

for node in tree.findall('.//grandParent'):
    print node

unutbu · Accepted Answer · 2014-08-27 13:18:32Z

import xml.etree.ElementTree as ET

tree = ET.parse('data')
for node in tree.getiterator():
    if int(node.attrib.get('age', 0)) > 21:
        node.attrib['alcohol'] = 'yes'
root = tree.getroot()
ET.register_namespace("", "XYZ")
print(ET.tostring(root))

yields

<root xmlns="XYZ" usingPalette="">

<grandParent hostName="XYZ">
<parent>
        <child name="JohnsDad">
            <grandChildren age="22" alcohol="yes" name="John" sex="male" />
        </child>
        <child name="PaulasDad">
            <grandChildren age="15" alcoho="no" name="Paula" sex="female" />
        </child>
</parent>
</grandParent>     
</root>

By the way, since the XML uses the namespace "XYZ", you must specify the namespace in your XPath:

for node in tree.findall('.//{XYZ}grandParent'):
    print node

That will return the grandParent element, but since you want to inspect all subnodes, I think using getiterator is easier here.

To preserve comments while using xml.etree.ElementTree you could use the custom parser Fredrik Lundh shows here:

import xml.etree.ElementTree as ET


class PIParser(ET.XMLTreeBuilder):
    """
    http://effbot.org/zone/element-pi.htm
    """
    def __init__(self):
        ET.XMLTreeBuilder.__init__(self)
        # assumes ElementTree 1.2.X
        self._parser.CommentHandler = self.handle_comment
        self._parser.ProcessingInstructionHandler = self.handle_pi
        self._target.start("document", {})

    def close(self):
        self._target.end("document")
        return ET.XMLTreeBuilder.close(self)

    def handle_comment(self, data):
        self._target.start(ET.Comment, {})
        self._target.data(data)
        self._target.end(ET.Comment)

    def handle_pi(self, target, data):
        self._target.start(ET.PI, {})
        self._target.data(target + " " + data)
        self._target.end(ET.PI)


tree = ET.parse('data', PIParser())

Note that if you install lxml, you could instead use:

import lxml.etree as ET
parser = ET.XMLParser(remove_comments=False)
tree = etree.parse('data', parser=parser)

that works unutbu. Although it stripes out any comments in the original file if I write the result back into this file. How do I retain the comments?
unutbu - in the first approach, how do I avoid wrapping the xml inside another document tag? I tried commenting the start & end tag code but didn't help.
Do you mean you want to print only the <grandParent> element?
I want to have the XML structure intact and have comments removed.
Doesn't Lunhd's PIParser, or the lxml solution already do that?

Collectives™ on Stack Overflow

updating XML attribute value in python

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related