How do you convert a given dataframe with a hierarchical structure and arbitrary shape (say, similar to the one below) into a new dataframe with a parent and child column?
Edit: Note that a constraint is that a child cannot be its own parent.
data = {'level1': ['A', 'A', 'B', 'B', 'C'],
'level2': ['James', 'Robert', 'Patricia', 'Patricia', 'John'],
'level3': ['Stockholm', 'Denver', 'Moscow', 'Moscow', 'Palermo'],
'level4': ['red', 'Denver', 'yellow', 'purple', 'blue']
}
df = pd.DataFrame(data)
level1 level2 level3 level4
0 A James Stockholm red
1 A Robert Denver Denver
2 B Patricia Moscow yellow
3 B Patricia Moscow purple
4 C John Palermo blue
Desired output is something like this:
parent child
0 A James
1 A Robert
2 B Patricia
3 C John
4 James Stockholm
5 Robert Denver
6 Patricia Moscow
7 John Palermo
8 Stockholm red
9 Moscow yellow
10 Moscow purple
11 Palermo blue