I have pandas dataframe and I'd like to return the names of the columns with the three highest values. For example:
import numpy as np
import pandas as pd
a = np.array([[2., 1., 0., 5., 4.], [6., 10., 7., 1., 3.]])
df = pd.DataFrame(a, columns=['A', 'B', 'C', 'D', 'E'])
Gives:
A B C D E
0 2 1 0 5 4
1 6 10 7 1 3
For each row, I want to add three new columns with the column names with the highest three values:
A B C D E First Second Third
0 2 1 0 5 4 D E A
1 6 10 7 1 3 B C A
I've gotten as far as using argpartition to get the indices for the top three columns in each row:
inx = df.apply(np.argpartition, args=(-3,), axis=1).ix[:, -3:].values
Which then needs to get sorted
sorted_inx = inx.sort()
It isn't clear how I would then take these column indices, get the names, and then populate them back into df as three new columns