I have a PySpark dataframe of students with schema as follows:
Id: string
|-- School: array
|-- element: struct
| |-- Subject: string
| |-- Classes: string
| |-- Score: array
| |-- element: struct
| |-- ScoreID: string
| |-- Value: string
I want to extract a few fields from the data frame and normalize it so that I can feed it in the database. The relational schema I expect consists of the fields Id, School, Subject, ScoreId, Value. How can I do it efficiently?