1

I need to parse some XML to CSV. I am struggling getting the 'record' attribute to iterate. The code below can pull out the allocation text. How do I get the record product-id?

    import xml.etree.ElementTree as ET
    mytree = ET.parse('Salesforce_01_30_2023.xml')
    myroot = mytree.getroot()
    print(myroot)
    
    for x in myroot.findall('record'):
        product = myroot.attrib
        inventory = x.find('allocation').text
        print(product, inventory)

XML

<?xml version="1.0" encoding="UTF-8"?>
        <records>
            <record product-id="99124">
                <allocation>15</allocation>
                <allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
                <perpetual>false</perpetual>
                <preorder-backorder-handling>none</preorder-backorder-handling>
                <ats>15</ats>
            </record>
            <record product-id="011443">
                <allocation>0</allocation>
                <allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
                <perpetual>false</perpetual>
                <preorder-backorder-handling>none</preorder-backorder-handling>
                <ats>0</ats>
            </record>
1
  • 2
    Small note: call things what they are so that when you do need to ask others for help, your code "makes immediate sense". If you're iterating through record elements in a records top element, don't use x or product in your code. Call that root records, and iterate with for record in records.findAll('record'). Commented Feb 1, 2023 at 22:09

2 Answers 2

2

To get product-id number you can use .attrib["product-id"]:

import xml.etree.ElementTree as ET

mytree = ET.parse('Salesforce_01_30_2023.xml')
myroot = mytree.getroot()

for product in myroot.findall('record'):
    inventory = product.find('allocation').text
    print(product.attrib['product-id'], inventory)

Prints:

99124 15
011443 0
Sign up to request clarification or add additional context in comments.

2 Comments

this works as well. However, one thing I noticed when this exports to csv it removes the leading 0 in the numbers. Is there anyway to have the csv file not do this.
The zero should be there in the CSV file (try to open the CSV in text editor and check, if not, quote the column values). When you open the csv in excel or other sw, it tries to convert it to the number - so you need to open it without the converting step.
1

Option 1: You can use pandas DataFrame read_xml() and to_csv():

import pandas as pd

df = pd.read_xml("prod_id.xml", xpath=".//record")
df.to_csv('prod.csv')
print(df.to_string())

Output:

   product-id  allocation      allocation-timestamp  perpetual preorder-backorder-handling  ats
0       99124          15  2023-01-30T15:03:39.598Z      False                        none   15
1       11443           0  2023-01-30T15:03:39.598Z      False                        none    0

CSV:

,product-id,allocation,allocation-timestamp,perpetual,preorder-backorder-handling,ats
0,99124,15,2023-01-30T15:03:39.598Z,False,none,15
1,11443,0,2023-01-30T15:03:39.598Z,False,none,0

Option 2, if you prefere the xml.etree.ElementTree. xml attribute values can be searched by .get():

import xml.etree.ElementTree as ET
    
tree = ET.parse('prod_id.xml')
root = tree.getroot()
    
for elem in root.iter():
    # print(elem.tag, elem.attrib, elem.text)
    if elem.tag == "record":
        print("Product-id:",elem.get('product-id'))

Output:

Product-id: 99124
Product-id: 011443

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.