Replace values in dataframe column depending on another column with condition

Question

I need to replace values in the dataframe column x. The result should look like x_new. So in detail I have to keep the values in the x column where y is 1 and 255. Between 1 and 255 I must replace the x values with the value where y is 1. The values between 255 and 1 should stay the same. So how can I get the column x_new?

I guess it could work with replace and some condition but I do not know how to combine it. I look forward for any help and hints.

My dataframe looks like e.g.:

x        y    z    x_new
12.28   1    1     12.28
11.99   0    1     12.28
11.50   0    1     12.28
11.20   0    1     12.28
11.01   0    1     12.28
 9.74  255   0      9.74
13.80   0    0     13.80
15.2    0    0     15.2
17.8    0    0     17.8
12.1    1    1     12.1
11.9    0    1     12.1
11.7    0    1     12.1
11.2    0    1     12.1
10.3   255   0     10.3

Are your data sufficiently clean so that a row of 1 is always followed by a row of 255 before another row of 1? This complicates the logic a bit — ALollz
– ALollz, Commented Apr 11, 2019 at 16:57
yes the rows are clean, only the number of rows between 1 and 255 vary, so that means sometime there are 4 rows with zeros and sometimes e.g. 50 or 100. thanks for your great answer! — Maik
– Maik, Commented Apr 11, 2019 at 18:52

Vaishali · Accepted Answer · 2019-04-11 17:12:49Z

Multiple steps but works. Find index of rows where y is 255 till you find the next 1. Save the values in idx. Now create new_x using the idx and the other two condition (y == 1 or y == 255). Ffill the rest.

# Index of rows between 255 and 1 in column y
idx = df.loc[df['y'].replace(0, np.nan).ffill() == 255, 'y'].index

# Create x_new1 and assign value of x where index is idx or y == 1 or y ==255
df.loc[idx, 'x_new1'] = df['x']
df.loc[(df['y'] == 1) | (df['y'] == 255) , 'x_new1'] = df['x']

# ffill rest of the values in x_new1
df['x_new1'] = df['x_new1'].ffill()


    x       y   z   x_new   x_new1
0   12.28   1   1   12.28   12.28
1   11.99   0   1   12.28   12.28
2   11.50   0   1   12.28   12.28
3   11.20   0   1   12.28   12.28
4   11.01   0   1   12.28   12.28
5   9.74    255 0   9.74    9.74
6   13.80   0   0   13.80   13.80
7   15.20   0   0   15.20   15.20
8   17.80   0   0   17.80   17.80
9   12.10   1   1   12.10   12.10
10  11.90   0   1   12.10   12.10
11  11.70   0   1   12.10   12.10
12  11.20   0   1   12.10   12.10
13  10.30   255 0   10.30   10.30

Quang Hoang · Accepted Answer · 2019-04-11 16:58:38Z

Try:

# mark the occurrences of 1 and 255
df['is_1_255'] = df.y[(df.y==1)|(df.y==255)]
df['x_n'] = None

# copy the 1's 
df.loc[df.is_1_255==1,'x_n'] = df.loc[df.is_1_255==1,'x']

# fill is_1_255 with markers, 
#255 means between 255 and 1, 1 means between 1 and 255
df['is_1_255'] = df['is_1_255'].ffill()

# update the 255 values
df.loc[df.is_1_255==255, 'x_n'] = df.loc[df.is_1_255==255,'x']

# update the 1 values
df['x_n'].ffill(inplace=True)

Output:

+-----+-------+-----+---+-------+----------+-------+
| idx |   x   |  y  | z | x_new | is_1_255 |  x_n  |
+-----+-------+-----+---+-------+----------+-------+
|   0 | 12.28 |   1 | 1 | 12.28 | 1.0      | 12.28 |
|   1 | 11.99 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   2 | 11.50 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   3 | 11.20 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   4 | 11.01 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   5 | 9.74  | 255 | 0 | 9.74  | 255.0    | 9.74  |
|   6 | 13.80 |   0 | 0 | 13.80 | 255.0    | 13.80 |
|   7 | 15.20 |   0 | 0 | 15.20 | 255.0    | 15.20 |
|   8 | 17.80 |   0 | 0 | 17.80 | 255.0    | 17.80 |
|   9 | 12.10 |   1 | 1 | 12.10 | 1.0      | 12.10 |
|  10 | 11.90 |   0 | 1 | 12.10 | 1.0      | 12.10 |
|  11 | 11.70 |   0 | 1 | 12.10 | 1.0      | 12.10 |
|  12 | 11.20 |   0 | 1 | 12.10 | 1.0      | 12.10 |
|  13 | 10.30 | 255 | 0 | 10.30 | 255.0    | 10.30 |
+-----+-------+-----+---+-------+----------+-------+

ALollz · Accepted Answer · 2019-04-11 17:02:27Z

Assuming clean data where 1 and 255 always occur in pairs, we can form groups of 1-255 and groupby to fill in the data.

s = (df.y.eq(1).cumsum() == df.y.eq(255).cumsum()+1)
df['xnew'] = df.groupby(s.ne(s.shift()).cumsum().where(s)).x.transform('first').fillna(df.x)

        x    y  z   xnew
0   12.28    1  1  12.28
1   11.99    0  1  12.28
2   11.50    0  1  12.28
3   11.20    0  1  12.28
4   11.01    0  1  12.28
5    9.74  255  0   9.74
6   13.80    0  0  13.80
7   15.20    0  0  15.20
8   17.80    0  0  17.80
9   12.10    1  1  12.10
10  11.90    0  1  12.10
11  11.70    0  1  12.10
12  11.20    0  1  12.10
13  10.30  255  0  10.30

Though for something like this, you should really form a thorough unit test, because this logic can get quite tricky and problematic for incorrect inputs.

Collectives™ on Stack Overflow

Replace values in dataframe column depending on another column with condition

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related