2

So I have started a question yesterday: Multiple assignment in pandas based on the values in the same row, where I was wondering how to rank a row of data and assign the ranks to different columns in the same row. I have figured out how to do it by following Ed Chum's advice from here: how to apply a function to multiple columns in a pandas dataframe at one time .

And it actually worked, but then I noticed that I was creating incorrect columns along the way. And once I fix the bug, it no longer works....

So I have tried to recreate the issue on a toy example, and it does not work on the toy example too. Can someone point me to the error please, here is the code (python 3):

import pandas as pd
import numpy as np  
import scipy


df = pd.DataFrame(data={'a':[1,2,3],'b':[2,1,3],'c':[3,1,2],
                        'rank_a':[np.nan]*3,'rank_b':[np.nan]*3,'rank_c':[np.nan]*3})

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]
    print("ranked: "+str(ranked))

    for idx,rank_col in enumerate(rank_cols): 
        print("Before: "+str(row[rank_col]))
        row[rank_col] = ranked[idx]
        print("After: "+str(row[rank_col]))

then run: df.apply(lambda row: apply_rank(row),axis=1), to see that the assignments are done correctly.

and then run: df to see that nothing was assigned.. facepalm

2

2 Answers 2

2

You can return Series with index for values of new columns:

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]

    return pd.Series(ranked, index=rank_cols)

df = df.apply(lambda row: apply_rank(row),axis=1)
print (df)
   rank_a  rank_b  rank_c
0   0.250   0.500   0.750
1   0.750   0.375   0.375
2   0.625   0.625   0.250

EDIT: If new columns exist before is possible append data to them and return row:

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]

    row.loc[rank_cols] = ranked
    return row

df = df.apply(apply_rank,axis=1)
print (df)
     a    b    c  rank_a  rank_b  rank_c
0  1.0  2.0  3.0   0.250   0.500   0.750
1  2.0  1.0  1.0   0.750   0.375   0.375
2  3.0  3.0  2.0   0.625   0.625   0.250
Sign up to request clarification or add additional context in comments.

4 Comments

is it possible to preserve the original columns in there as well?
PERFECT! LEGEND!
You are welcome! And I have already similar joy if something working ;)
I have spent two hours on this... :)
0

df[col].iloc[[2,3,4] = 2

in dataframe df, at particular column name col, for the index (2,3,4) We can set the value as 2 as shown above

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.