3

I have below xml file data:

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<rootnode>
  <TExportCarcass>
    <BodyNum>6168</BodyNum>
    <BodyWeight>331.40</BodyWeight>
    <UnitID>1</UnitID>
    <Plant>239</Plant>
    <pieces>
      <TExportCarcassPiece index="0">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
      <TExportCarcassPiece index="1">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
    </pieces>
  </TExportCarcass>
  <TExportCarcass>
    <BodyNum>6169</BodyNum>
    <BodyWeight>334.40</BodyWeight>
    <UnitID>1</UnitID>
    <Plant>278</Plant>
    <pieces>
      <TExportCarcassPiece index="0">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
      <TExportCarcassPiece index="1">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
    </pieces>
  </TExportCarcass>
</rootnode>

I am using python's lxml module to read data from xml file like below:

from lxml import etree

doc = etree.parse('file.xml')

memoryElem = doc.find('BodyNum')
print(memoryElem)        

But its only printing None instead of 6168. Please suggest what I am doing wrong here.

5 Answers 5

2

You need to iterate each TExportCarcass tag and then use find to access BodyNum

Ex:

from lxml import etree

doc = etree.parse('file.xml')
for elem in doc.findall('TExportCarcass'):
    print(elem.find("BodyNum").text) 

Output:

6168
6169

or

print([i.text for i in doc.findall('TExportCarcass/BodyNum')]) #-->['6168', '6169']
Sign up to request clarification or add additional context in comments.

Comments

2

When you run find on a text string, it will only search for elements at the root level. You can instead use xpath queries within find to search for any element within the doc:

  1. To get the first element only:
from lxml import etree
doc = etree.parse('file.xml')

memoryElem = doc.find('.//BodyNum')
memoryElem.text
# 6168
  1. To get all elements:
[ b.text for b in doc.iterfind('.//BodyNum') ]
# ['6168', '6169']

Comments

2

1 - Use / to specify the tree level of the element you want to extract

2 - Use .text to extract the name of the elemnt

doc = etree.parse('file.xml')
memoryElem = doc.find("*/BodyNum") #BodyNum is one level down
print(memoryElem.text)  #Specify you want to extract the name of the element

Comments

0

Just use the inbuild xml.etree.Etree module of python

https://docs.python.org/3/library/xml.etree.elementtree.html

Comments

0

Your document contains multiple BodyNum elements.
You need to put an explicit limit into a query if you need only the 1st element.

Use the following flexible approach based on xpath query:

from lxml import etree

doc = etree.parse('file.xml')
memoryElem = doc.xpath('(//BodyNum)[1]/text()')
print(memoryElem)   # ['6168']

4 Comments

Is it possible to get the number of TExportCarcass
Sure thanks, I thought we can use comment section to ask for extra information.
@SAndrew, Are you sure that this approach worth to be silly downvoted?
This is also a valid answer. Not sure why it was downvoted

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.