Assigning to multiple columns at once (python pandas)

Question

So I have started a question yesterday: Multiple assignment in pandas based on the values in the same row, where I was wondering how to rank a row of data and assign the ranks to different columns in the same row. I have figured out how to do it by following Ed Chum's advice from here: how to apply a function to multiple columns in a pandas dataframe at one time .

And it actually worked, but then I noticed that I was creating incorrect columns along the way. And once I fix the bug, it no longer works....

So I have tried to recreate the issue on a toy example, and it does not work on the toy example too. Can someone point me to the error please, here is the code (python 3):

import pandas as pd
import numpy as np  
import scipy


df = pd.DataFrame(data={'a':[1,2,3],'b':[2,1,3],'c':[3,1,2],
                        'rank_a':[np.nan]*3,'rank_b':[np.nan]*3,'rank_c':[np.nan]*3})

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]
    print("ranked: "+str(ranked))

    for idx,rank_col in enumerate(rank_cols): 
        print("Before: "+str(row[rank_col]))
        row[rank_col] = ranked[idx]
        print("After: "+str(row[rank_col]))

then run: df.apply(lambda row: apply_rank(row),axis=1), to see that the assignments are done correctly.

and then run: df to see that nothing was assigned.. facepalm

Does this answer your question? Return multiple columns from apply pandas — smci
– smci, Commented Apr 19, 2020 at 11:27
Anyway, pandas has its own native rank() function for both DataFrame and Series. So you shouldn't need to implement this. — smci
– smci, Commented Apr 19, 2020 at 11:48

jezrael · Accepted Answer · 2017-11-22 11:57:50Z

You can return Series with index for values of new columns:

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]

    return pd.Series(ranked, index=rank_cols)

df = df.apply(lambda row: apply_rank(row),axis=1)
print (df)
   rank_a  rank_b  rank_c
0   0.250   0.500   0.750
1   0.750   0.375   0.375
2   0.625   0.625   0.250

EDIT: If new columns exist before is possible append data to them and return row:

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]

    row.loc[rank_cols] = ranked
    return row

df = df.apply(apply_rank,axis=1)
print (df)
     a    b    c  rank_a  rank_b  rank_c
0  1.0  2.0  3.0   0.250   0.500   0.750
1  2.0  1.0  1.0   0.750   0.375   0.375
2  3.0  3.0  2.0   0.625   0.625   0.250

is it possible to preserve the original columns in there as well?
You are welcome! And I have already similar joy if something working ;)

a_local_nobody · Accepted Answer · 2019-08-05 09:11:55Z

0

df[col].iloc[[2,3,4] = 2

in dataframe df, at particular column name col, for the index (2,3,4) We can set the value as 2 as shown above

edited Aug 5, 2019 at 9:11

a_local_nobody

8,2876 gold badges32 silver badges55 bronze badges

answered Aug 5, 2019 at 8:51

Ayyasamy

1591 silver badge15 bronze badges

Collectives™ on Stack Overflow

Assigning to multiple columns at once (python pandas)

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related