Parse XML to CSV with python

Question

I need to parse some XML to CSV. I am struggling getting the 'record' attribute to iterate. The code below can pull out the allocation text. How do I get the record product-id?

    import xml.etree.ElementTree as ET
    mytree = ET.parse('Salesforce_01_30_2023.xml')
    myroot = mytree.getroot()
    print(myroot)
    
    for x in myroot.findall('record'):
        product = myroot.attrib
        inventory = x.find('allocation').text
        print(product, inventory)

XML

<?xml version="1.0" encoding="UTF-8"?>
        <records>
            <record product-id="99124">
                <allocation>15</allocation>
                <allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
                <perpetual>false</perpetual>
                <preorder-backorder-handling>none</preorder-backorder-handling>
                <ats>15</ats>
            </record>
            <record product-id="011443">
                <allocation>0</allocation>
                <allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
                <perpetual>false</perpetual>
                <preorder-backorder-handling>none</preorder-backorder-handling>
                <ats>0</ats>
            </record>

Small note: call things what they are so that when you do need to ask others for help, your code "makes immediate sense". If you're iterating through record elements in a records top element, don't use x or product in your code. Call that root records, and iterate with for record in records.findAll('record'). — Mike 'Pomax' Kamermans
– Mike 'Pomax' Kamermans, Commented Feb 1, 2023 at 22:09

Andrej Kesely · Accepted Answer · 2023-02-01 22:02:47Z

2

To get product-id number you can use .attrib["product-id"]:

import xml.etree.ElementTree as ET

mytree = ET.parse('Salesforce_01_30_2023.xml')
myroot = mytree.getroot()

for product in myroot.findall('record'):
    inventory = product.find('allocation').text
    print(product.attrib['product-id'], inventory)

Prints:

99124 15
011443 0

answered Feb 1, 2023 at 22:02

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ross Over a year ago

this works as well. However, one thing I noticed when this exports to csv it removes the leading 0 in the numbers. Is there anyway to have the csv file not do this.

Andrej Kesely Over a year ago

The zero should be there in the CSV file (try to open the CSV in text editor and check, if not, quote the column values). When you open the csv in excel or other sw, it tries to convert it to the number - so you need to open it without the converting step.

Hermann12 · Accepted Answer · 2023-02-02 19:38:54Z

Option 1: You can use pandas DataFrame read_xml() and to_csv():

import pandas as pd

df = pd.read_xml("prod_id.xml", xpath=".//record")
df.to_csv('prod.csv')
print(df.to_string())

Output:

   product-id  allocation      allocation-timestamp  perpetual preorder-backorder-handling  ats
0       99124          15  2023-01-30T15:03:39.598Z      False                        none   15
1       11443           0  2023-01-30T15:03:39.598Z      False                        none    0

CSV:

,product-id,allocation,allocation-timestamp,perpetual,preorder-backorder-handling,ats
0,99124,15,2023-01-30T15:03:39.598Z,False,none,15
1,11443,0,2023-01-30T15:03:39.598Z,False,none,0

Option 2, if you prefere the xml.etree.ElementTree. xml attribute values can be searched by .get():

import xml.etree.ElementTree as ET
    
tree = ET.parse('prod_id.xml')
root = tree.getroot()
    
for elem in root.iter():
    # print(elem.tag, elem.attrib, elem.text)
    if elem.tag == "record":
        print("Product-id:",elem.get('product-id'))

Output:

Product-id: 99124
Product-id: 011443

Collectives™ on Stack Overflow

Parse XML to CSV with python

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related