Replace certain values depending on condition in pandas

Question

I'm trying to translate the following R code to Python and am stuck because of the row-indexing...

df$col3[index+1] <− df$col2[index] # what he want :col2 in certain index  assign its value to col3 by index increment 1.

Fictitiuous example

df = pd.DataFrame({'id' : [1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5], 
'id_old' : [1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5, 5], 
'col1' : np.random.normal(size = 12), 
'col2' : np.random.randint(low = 20, high = 50, size = 12), 
'col3' : np.repeat(20, 12)})
print(df)

myindex = np.where(df.id != df.id_old) # tuple
print(myindex)
print(np.add(myindex, 1))
replacement_values = df.iloc[myindex][['col2']]

Output

    id  id_old      col1  col2  col3
0    1       1  0.308380    23    20
1    1       1  1.185646    35    20
2    1       2 -0.432066    27    20
3    2       2  0.115055    32    20
4    2       3  0.683291    34    20
5    3       4 -1.916321    42    20
6    4       4  0.888327    34    20
7    4       4  1.312879    29    20
8    4       4  1.260612    27    20
9    4       5  0.062635    22    20
10   5       5  0.081149    23    20
11   5       5 -1.872873    32    20

(array([2, 4, 5, 9]),)
[[ 3  5  6 10]]

This is what I tried :

df.loc[np.add(myindex, 1), 'col3'] = replacement_values
df.loc[df.index.isin(np.add(myindex + 1)), 'col3'] = replacement_values

Desired result :

    id  id_old      col1  col2  col3
0    1       1  0.308380    23    20
1    1       1  1.185646    35    20
2    1       2 -0.432066    27    20
3    2       2  0.115055    32    27
4    2       3  0.683291    34    20
5    3       4 -1.916321    42    34
6    4       4  0.888327    34    42
7    4       4  1.312879    29    20
8    4       4  1.260612    27    20
9    4       5  0.062635    22    20
10   5       5  0.081149    23    22
11   5       5 -1.872873    32    20

I guess I'm overlooking something basic, or am I completely on the wrong path?

Thanks a lot for your help!

A lot of us don't know what R is or how it works. Please clearly explain what the code is trying to do. — cs95
– cs95, Commented May 30, 2018 at 15:28
df.loc[np.add(myindex, 1)[0],'col3']=df.iloc[myindex]['col2'].values — BENY
– BENY, Commented May 30, 2018 at 15:30
Is this what you want? df['col3'] = df.col2.where(df.id != df.id_old).shift().fillna(df.col3) — cs95
– cs95, Commented May 30, 2018 at 15:30
@coldspeed I think he just want to change the shift the certain value down — BENY
– BENY, Commented May 30, 2018 at 15:31
@Wen can you kindly edit the question with a little explanation? Since you know R. — cs95
– cs95, Commented May 30, 2018 at 15:32

BENY · Accepted Answer · 2018-05-30 15:40:01Z

2

Fix your code , by adding values data.frame in R is no index sensitive , but in pandas , index do matter

df=pd.read_clipboard()
df.loc[np.add(myindex, 1)[0],'col3']=df.iloc[myindex]['col2'].values
df
Out[399]: 
    id  id_old      col1  col2  col3
0    1       1  0.308380    23    20
1    1       1  1.185646    35    20
2    1       2 -0.432066    27    20
3    2       2  0.115055    32    27
4    2       3  0.683291    34    20
5    3       4 -1.916321    42    34
6    4       4  0.888327    34    42
7    4       4  1.312879    29    20
8    4       4  1.260612    27    20
9    4       5  0.062635    22    20
10   5       5  0.081149    23    22
11   5       5 -1.872873    32    20

answered May 30, 2018 at 15:40

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

BENY Over a year ago

@coldspeed I have plenty of vote, you will have my support man :-)

KaB Over a year ago

Thanks a lot, this is exactly what I was looking for (and I fixed my code, sorry for that!). Hopefully I'll be on top of the Python logic soon.

cs95 · Accepted Answer · 2018-05-30 15:37:44Z

Not sure why pandas needs such an involved operation for something that looks so simple with R, but here it is, with mask/where + shift + fillna:

df['col3'] = (
    df.col2.where(df.id != df.id_old).shift().fillna(df.col3).astype(int)
)

df
    id  id_old      col1  col2  col3
0    1       1  0.308380    23    20
1    1       1  1.185646    35    20
2    1       2 -0.432066    27    20
3    2       2  0.115055    32    27
4    2       3  0.683291    34    20
5    3       4 -1.916321    42    34
6    4       4  0.888327    34    42
7    4       4  1.312879    29    20
8    4       4  1.260612    27    20
9    4       5  0.062635    22    20
10   5       5  0.081149    23    22
11   5       5 -1.872873    32    20

In my little experience with R, I've found that there're lots of things that get really verbose in pandas which are done in one, maybe two funcs in R hehe. But that also makes R less generalist

rafaelc · Accepted Answer · 2018-05-30 15:42:09Z

1

IIUC

mask = (df.id_old - df.id).shift().fillna(0).astype(bool)
df.loc[mask, "col3"] = df.loc[mask, "col2"]

answered May 30, 2018 at 15:42

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Collectives™ on Stack Overflow

Replace certain values depending on condition in pandas

3 Answers 3

2 Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Related