Suppose I have a DataFrame like
import pandas as pd
df = pd.DataFrame({
'Id' : [1,2,3,4,5,6,7,8,9],
'Group' : [1,1,2,2,2,2,3,3,3],
'Value_to_compare' : [2,1,5,8,2,3,10,23,17],
'Other_value' : [0,3,2,6,3,4,2,7,1]
})
I would like to create a new column, say Value_of_Highest, displaying for each row Other_value of element having the highest Value_to_compare of its Group. For example, here:
- Group 1 has 2 elements, its highest
Value_to_compareis 2, forId= 1, for whichOther_valueis 0 - Group 2 has 4 elements, highest
Value_to_compareis 8, forId= 4, for whichOther_valueis 6 - Group 3 has 3 elements, highest
Value_to_compareis 23, forId= 8, for whichOther_valueis 7
So I would like to add a column so that df becomes
This is the best way I know to do this:
def my_func(x):
x = x.sort_values('Value_to_compare',ascending = False)
Value_of_Highest = x.head(1)['Other_value'].values[0]
return pd.Series([Value_of_Highest], index=['Value_of_Highest'])
grouped = df.groupby('Group').apply(my_func).reset_index()
df = df.merge(grouped)
I am pretty sure there is a far more elegant and efficient way to do this in Python/Pandas.
Edit: after first answer from @CameronRiddell, I realized my example was flawed. I corrected it and @CameronRiddell edited his answer, which works well.
