I have huge no of nested JSON having more than 200 keys want to convert & store in structure table.
|-- ip_address: string (nullable = true)
|-- xs_latitude: double (nullable = true)
|-- Applications: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- b_als_o_isehp: string (nullable = true)
| | |-- b_als_p_isehp: string (nullable = true)
| | |-- b_als_s_isehp: string (nullable = true)
| | |-- l_als_o_eventid: string (nullable = true)
....
Read JSON and get each ip_address having one application array data
{"ip_address": 1512199720,"Applications": [{"s_pd": -1,"s_path": "NA", "p_pd": "temp0"}, {"s_pd": -1,"s_path": "root/hdfs", "p_pd": "temp1"},{"s_pd": -1,"s_path": "root/hdfs", "p_pd": "temp2"}],}
val data = spark.read.json("file:///root/users/data/s_json.json")
var appDf = data.withColumn("data",explode($"Applications")).select($"Applications.s_pd", $"Applications.s_path", $"Applications.p_pd", $"ip_address")
appDf.printSchema
/// gives
root
|-- s_pd: array (nullable = true)
| |-- element: string (containsNull = true)
|-- s_path: array (nullable = true)
| |-- element: string (containsNull = true)
|-- p_pd: array (nullable = true)
| |-- element: string (containsNull = true)
|-- ip_address: string (nullable = true)
In each dataframe record contain an array with duplicate records. How to get the record in table format.

appDf.select("ip_addres", "xs_latitude", "Applications.*")to flatten out such a structure. or is it arbitrarily deeply nested?