0

I have an issue with ElementTree that I can't quite figure out. I've read all their documentation as well as all the information I could find on this forum. I have a couple elements/nodes that I am trying to remove using ElementTree. I don't get any errors with the following code, but when I look at the output file I wrote the changes to, the elements/nodes that I expected to be removed are still there. I have a document that looks like this:

<data>
  <config>
    <script filename="test1.txt"></script>
    <documentation filename="test2.txt"></script>
  </config>
</data>

My code looks as follows:

import xml.etree.ElementTree as ElementTree    
xmlTree = ElementTree.parse(os.path.join(sourcePath, "test.xml"))
xmlRoot = xmlTree.getroot()
for doc in xmlRoot.findall('documentation'):
     xmlRoot.remove(doc)

xmlTree.write(os.path.join(sourcePath, "testTWO.xml"))

The result is I get the following document:

<data>
  <config>
    <script filename="test1.txt" />
    <documentation filename="test2.txt" />
  </config>
</data>

What I need is something more like this. I am not stuck using ElementTree. If there is a better solution with lxml or some other library, I am all ears. I know ElementTree can be a little bit of a pain at times.

<data>
  <config>
  </config>
</data>

1 Answer 1

2

xmlRoot.findall('documentation') in your code didn't find anything, because <documentation> isn't direct child of the root element <data>. It is actually direct child of <config> :

"Element.findall() finds only elements with a tag which are direct children of the current element". [19.7.1.3. Finding interesting elements]

This is one possible way to remove all children of <config> using findall() given sample XML you posted (and assuming that the actual XML has <documentation> element closed with proper closing tag instead of closed with </script>) :

......
config = xmlRoot.find('config')

# find all children of config
for doc in config.findall('*'):
    config.remove(doc)
    # print just to make sure the element to be removed is correct
    print ElementTree.tostring(doc)
......
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for the example. I can see where I went wrong based on what you provided. However, in the initial XML example I just put a couple elements under <config>. I only want to remove the <script> and <documentation> elements and keep everything else. So I added the following code. The documentation node gets deleted but not the script. When I print ElementTree.tostring() I see that it properly finds the <script> and <documentation> elements.
Here is the code: # Remove the 'documentation' and 'script' tags from test.xml if documentation is None: pass else: config = xmlRoot.find('config') for doc in config.findall('documentation'): config.remove(doc) print ElementTree.tostring(doc)
if script is None: pass else: config = xmlRoot.find('config') for script in config.findall('script'): config.remove print ElementTree.tostring(script) xmlTree.write(os.path.join(sourcePath, "driverTWO.xml"))
Nevermind. I forgot to pass in the variable to config.remove. Added config.remove(script) and it works fine.
I would like to point out that you can use findall to find elements anywhere in an XML tree (not just direct children). findall(".//documentation") is an example. See docs.python.org/2/library/….
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.