I have a large-ish XML file. I include a scrubbed snippet of it below. The result-size below is the number of elements in the XML file that the file calls <row>.
Question 1: How can only get a record in the output CSV file for the every <row> of type <COLLECTION-ITEM>, and not of all of the other types? I have no control over how the XML is structured. If you comment out all but the test values in my Python code, you'll see that file is outputting 5 records when I only want 1.
Question 2: What do I need to do differently in my Python to get name = i.find("name").text to return "name of Item 1"?
I think it's fair for the answer to just focus on those first two questions. I hope that getting those answers puts me on a path to solving the rest of my problem with this XML to CSV issue. Here is the big picture and a couple more issues I will need to solve, though. Point me to books, classes, whatever will help. I only have a week or so to make this work.
Desired output:
Collection item,ITEM-ID,ATTRIB-1,PERSON-TYPE-1-NAME,ATTRIB-2,PERSON-TYPE-2-NAME,RELATED-THING-1 id,RELATED-THING-2 IDs
name of Item 1,item_000001,Yes,name of person 1,Yes,name of person 2,thing_000745,"thing_000783, thing_000803"
I can read in the file and output a CSV file with my designated column names. But I can't get the name of even the first thing into the CSV.
Then there are the more complicated bits that will require functions, such as finding an ID attribute based on matching the type and name attributes, and returning more than one ID. See RELATED-THING-1-ID and RELATED-THING-2-IDs.
Here is my Python (based on https://www.geeksforgeeks.org/convert-xml-to-csv-in-python/):
# -*- coding: utf-8 -*-
# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["Collection item", "ITEM-ID", "ATTRIB-1", "PERSON-TYPE-1-NAME" ,
        "ATTRIB-2", "PERSON-TYPE-2-NAME", "RELATED-THING-1 id",
        "RELATED-THING-2 IDs"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('sample.xml')
root = xmlparse.getroot()
for i in root:
    name = i.find("name").text
    item_id = i.find("ITEM-ID").text
    attrib_1 = i.find("ATTRIB-1").text
    p1_name = i.find("PERSON-TYPE-1-NAME.result.row.name").text
    attrib_2 = i.find("ATTRIB-2").text
    p2_name = i.find("PERSON-TYPE-2-NAME.result.row.name").text
    relat_thing1_id = i.find("country").text
    relat_thing2_ids = i.find("country").text
    rows.append({"Collection item": name,
                 "ITEM-ID": item_id,
                 "ATTRIB-1": attrib_1,
                 "PERSON-TYPE-1-NAME": p1_name,
                 "ATTRIB-2": attrib_2,
                 "PERSON-TYPE-2-NAME": p2_name,
                 "RELATED-THING-1 id": relat_thing1_id,
                 "RELATED-THING-2 IDs": relat_thing2_ids
    })
df = pd.DataFrame(rows, columns=cols)
# Writing dataframe to csv
df.to_csv('output.csv')
This is the XML:
<?xml version="1.0" encoding="UTF-8"?>
<result size="4321">
  <row>
    <id>3255183</id>
    <type>CONTEXT</type>
    <name>collections</name>
  </row>
  <row>
    <id>3652889</id>
    <type>COLLECTION-ITEM</type>
    <name>name of Item 1</name>
    <ITEM-ID>item_000001</ITEM-ID>
    <ATTRIB-1>Yes</ATTRIB-1>
    <PERSON-TYPE-1-NAME>
      <result size="1">
        <row>
          <id>3254728</id>
          <scopeId>3254388</scopeId>
          <type>PERSON</type>
          <name>name of person 1</name>
          <no>1</no>
        </row>
      </result>
    </PERSON-TYPE-1-NAME>
    <ATTRIB-2>Yes</ATTRIB-2>
    <PERSON-TYPE-2-NAME>
      <result size="1">
        <row>
          <id>3254403</id>
          <scopeId>3254388</scopeId>
          <type>PERSON</type>
          <name>name of person 2</name>
          <no>1</no>
        </row>
      </result>
    </PERSON-TYPE-2-NAME>
    <RELATED-THING-1>
      <result size="1">
        <row>
          <id>3391122</id>
          <scopeId>3255191</scopeId>
          <type>THING-TYPE-1</type>
          <name>thing type 1 name 1</name>
          <no>1</no>
        </row>
      </result>
    </RELATED-THING-1>
    <RELATED-THING-2>
      <result size="2">
        <row>
          <id>3255215</id>
          <scopeId>3255198</scopeId>
          <type>THING-TYPE-2</type>
          <name>thing type 2 name 1</name>
          <no>1</no>
        </row>
        <row>
          <id>3255227</id>
          <scopeId>3255198</scopeId>
          <type>THING-TYPE-2</type>
          <name>thing type 2 name 2</name>
          <no>1</no>
        </row>
      </result>
    </RELATED-THING-2>
  </row>
  <row>
    <id>3391122</id>
    <type>THING-TYPE-1</type>
    <name>thing type 1 name 1</name>
    <THING-ID>thing_000745</THING-ID>
  </row>
  <row>
    <id>3255215</id>
    <type>THING-TYPE-2</type>
    <name>thing type 2 name 1</name>
    <THING-ID>thing_000783</THING-ID>
  </row>
  <row>
    <id>3255227</id>
    <type>THING-TYPE-2</type>
    <name>thing type 2 name 2</name>
    <THING-ID>thing_000803</THING-ID>
  </row>
</result>

