5

I need to replace values in the dataframe column x. The result should look like x_new. So in detail I have to keep the values in the x column where y is 1 and 255. Between 1 and 255 I must replace the x values with the value where y is 1. The values between 255 and 1 should stay the same. So how can I get the column x_new?

I guess it could work with replace and some condition but I do not know how to combine it. I look forward for any help and hints.

My dataframe looks like e.g.:

x        y    z    x_new
12.28   1    1     12.28
11.99   0    1     12.28
11.50   0    1     12.28
11.20   0    1     12.28
11.01   0    1     12.28
 9.74  255   0      9.74
13.80   0    0     13.80
15.2    0    0     15.2
17.8    0    0     17.8
12.1    1    1     12.1
11.9    0    1     12.1
11.7    0    1     12.1
11.2    0    1     12.1
10.3   255   0     10.3
2
  • Are your data sufficiently clean so that a row of 1 is always followed by a row of 255 before another row of 1? This complicates the logic a bit Commented Apr 11, 2019 at 16:57
  • yes the rows are clean, only the number of rows between 1 and 255 vary, so that means sometime there are 4 rows with zeros and sometimes e.g. 50 or 100. thanks for your great answer! Commented Apr 11, 2019 at 18:52

3 Answers 3

3

Multiple steps but works. Find index of rows where y is 255 till you find the next 1. Save the values in idx. Now create new_x using the idx and the other two condition (y == 1 or y == 255). Ffill the rest.

# Index of rows between 255 and 1 in column y
idx = df.loc[df['y'].replace(0, np.nan).ffill() == 255, 'y'].index

# Create x_new1 and assign value of x where index is idx or y == 1 or y ==255
df.loc[idx, 'x_new1'] = df['x']
df.loc[(df['y'] == 1) | (df['y'] == 255) , 'x_new1'] = df['x']

# ffill rest of the values in x_new1
df['x_new1'] = df['x_new1'].ffill()


    x       y   z   x_new   x_new1
0   12.28   1   1   12.28   12.28
1   11.99   0   1   12.28   12.28
2   11.50   0   1   12.28   12.28
3   11.20   0   1   12.28   12.28
4   11.01   0   1   12.28   12.28
5   9.74    255 0   9.74    9.74
6   13.80   0   0   13.80   13.80
7   15.20   0   0   15.20   15.20
8   17.80   0   0   17.80   17.80
9   12.10   1   1   12.10   12.10
10  11.90   0   1   12.10   12.10
11  11.70   0   1   12.10   12.10
12  11.20   0   1   12.10   12.10
13  10.30   255 0   10.30   10.30
Sign up to request clarification or add additional context in comments.

Comments

3

Try:

# mark the occurrences of 1 and 255
df['is_1_255'] = df.y[(df.y==1)|(df.y==255)]
df['x_n'] = None

# copy the 1's 
df.loc[df.is_1_255==1,'x_n'] = df.loc[df.is_1_255==1,'x']

# fill is_1_255 with markers, 
#255 means between 255 and 1, 1 means between 1 and 255
df['is_1_255'] = df['is_1_255'].ffill()

# update the 255 values
df.loc[df.is_1_255==255, 'x_n'] = df.loc[df.is_1_255==255,'x']

# update the 1 values
df['x_n'].ffill(inplace=True)

Output:

+-----+-------+-----+---+-------+----------+-------+
| idx |   x   |  y  | z | x_new | is_1_255 |  x_n  |
+-----+-------+-----+---+-------+----------+-------+
|   0 | 12.28 |   1 | 1 | 12.28 | 1.0      | 12.28 |
|   1 | 11.99 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   2 | 11.50 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   3 | 11.20 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   4 | 11.01 |   0 | 1 | 12.28 | 1.0      | 12.28 |
|   5 | 9.74  | 255 | 0 | 9.74  | 255.0    | 9.74  |
|   6 | 13.80 |   0 | 0 | 13.80 | 255.0    | 13.80 |
|   7 | 15.20 |   0 | 0 | 15.20 | 255.0    | 15.20 |
|   8 | 17.80 |   0 | 0 | 17.80 | 255.0    | 17.80 |
|   9 | 12.10 |   1 | 1 | 12.10 | 1.0      | 12.10 |
|  10 | 11.90 |   0 | 1 | 12.10 | 1.0      | 12.10 |
|  11 | 11.70 |   0 | 1 | 12.10 | 1.0      | 12.10 |
|  12 | 11.20 |   0 | 1 | 12.10 | 1.0      | 12.10 |
|  13 | 10.30 | 255 | 0 | 10.30 | 255.0    | 10.30 |
+-----+-------+-----+---+-------+----------+-------+

Comments

3

Assuming clean data where 1 and 255 always occur in pairs, we can form groups of 1-255 and groupby to fill in the data.

s = (df.y.eq(1).cumsum() == df.y.eq(255).cumsum()+1)
df['xnew'] = df.groupby(s.ne(s.shift()).cumsum().where(s)).x.transform('first').fillna(df.x)

        x    y  z   xnew
0   12.28    1  1  12.28
1   11.99    0  1  12.28
2   11.50    0  1  12.28
3   11.20    0  1  12.28
4   11.01    0  1  12.28
5    9.74  255  0   9.74
6   13.80    0  0  13.80
7   15.20    0  0  15.20
8   17.80    0  0  17.80
9   12.10    1  1  12.10
10  11.90    0  1  12.10
11  11.70    0  1  12.10
12  11.20    0  1  12.10
13  10.30  255  0  10.30

Though for something like this, you should really form a thorough unit test, because this logic can get quite tricky and problematic for incorrect inputs.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.