Am have issue in parsing below XML data in PySpark.
<item name="Cake" ppu="0.55">
<venue place="Bangalore" day="Friday">
<batters>
<batter name=Regular/>
<batter name=Chocolate/>
<batter name=Blueberry/>
</batters>
<topping id="5001">None</topping>
<topping id="5002">Glazed</topping>
<topping id="5005">Sugar</topping>
<topping id="5006">Sprinkles</topping>
<topping id="5003">Chocolate</topping>
<topping id="5004">Maple</topping>
</venue>
</item>
<item name="pizza" ppu="0.56"/>
<batters>
<batter place="Bangalore" name="Regular"/>
<batters>
</items>
Am able to parse first set of item tag. But am unable to parse second tag. Any suggestion would be helpful.
So far i have tried below,
df = spark.read\
.format("com.databricks.spark.xml")
.option("rowTag", "item")\
.option("valueTag", True)\
.load("File.xml")
This is providing me only the schema of first tag. Am unable to define nested schema as well.