So I have started a question yesterday: Multiple assignment in pandas based on the values in the same row, where I was wondering how to rank a row of data and assign the ranks to different columns in the same row. I have figured out how to do it by following Ed Chum's advice from here: how to apply a function to multiple columns in a pandas dataframe at one time .
And it actually worked, but then I noticed that I was creating incorrect columns along the way. And once I fix the bug, it no longer works....
So I have tried to recreate the issue on a toy example, and it does not work on the toy example too. Can someone point me to the error please, here is the code (python 3):
import pandas as pd
import numpy as np
import scipy
df = pd.DataFrame(data={'a':[1,2,3],'b':[2,1,3],'c':[3,1,2],
'rank_a':[np.nan]*3,'rank_b':[np.nan]*3,'rank_c':[np.nan]*3})
def apply_rank(row):
vals = [row['a'],row['b'],row['c']]
ranked = scipy.stats.rankdata(vals)
d = len(vals)+1
ranked = [rank/d for rank in ranked]
rank_cols = [col for col in row.index if col.startswith("rank_")]
print("ranked: "+str(ranked))
for idx,rank_col in enumerate(rank_cols):
print("Before: "+str(row[rank_col]))
row[rank_col] = ranked[idx]
print("After: "+str(row[rank_col]))
then run:
df.apply(lambda row: apply_rank(row),axis=1), to see that the assignments are done correctly.
and then run:
df to see that nothing was assigned.. facepalm
rank()function for both DataFrame and Series. So you shouldn't need to implement this.