I have a raw Dataframe pyspark with encapsulate column. I need to loop on all columns to unwrap those columns. I don't know name columns and they could change. So I need generic algorithm. The problem is that I can't use classic loop (for) because I need a paralleled code.
Example of Data:
Timestamp | Layers
1456982 | [[1, 2],[3,4]]
1486542 | [[3,5], [5,5]]
In layers, it's a column which contain other columns (with their own column names). My goal is to have something like this:
Timestamp | label | number1 | text | value
1456982 | 1 | 2 |3 |4
1486542 | 3 | 5 |5 |5
How can I make a loop on columns with pyspark function?
Thanks for advice