0

I have the following dataframe:

+----+----+---------+----------+
|    |A   |   B     |   C      |
|----+----+---------+----------|
|  0 |S   |   nan   |   5464.5 |
|  1 |A   |  5464.5 |   5464.5 |
|  2 |A   |  5675.5 |    nan   |
|  3 |S   |  5670   |    nan   |
|  4 |A   |  5664   |    nan   |
|  5 |B   |  5635.5 |    nan   |
|  6 |D   |  5624   |    nan   |
|  7 |C   |  5624   |    nan   |
|  8 |X   |   nan   |   5464.5 |
|  9 |C   |  5715.5 |    nan   |
| 10 |D   |  4704   |   5000   |
+----+----+---------+----------+

I want to replace the nan values in B and values in B < 5000 with a condition: if column A is 'S' it should be replaced with Column C if column A is 'X' it should be replaced with column C+10 if column A is 'D' it should be replaced with column C-10

so the result should look like this:

+----+----+---------+----------+
|    |A   |   B     |   C      |
|----+----+---------+----------|
|  0 |S   |  5464.5 |   5464.5 |
|  1 |A   |  5464.5 |   5464.5 |
|  2 |A   |  5675.5 |    nan   |
|  3 |S   |  5670   |    nan   |
|  4 |A   |  5664   |    nan   |
|  5 |B   |  5635.5 |    nan   |
|  6 |D   |  5624   |    nan   |
|  7 |C   |  5624   |    nan   |
|  8 |X   |  5474.5 |   5464.5 |
|  9 |C   |  5715.5 |    nan   |
| 10 |D   |  4704   |   4990   |
+----+----+---------+----------+

What is the most elegant, simple and readable way to solve this. I tend to loop over the table and change it, but i can find a lot of warnings not to do this.

1

2 Answers 2

1

you can use & or | operator to combine multiple conditions, and then use .loc method to select the data filtered by the condition.

And two matters need attention: 1. use .values to get the values assigned to the dataframe. 2. do not forget the parentheses when using == operator.

index = df['B'].isna()|(df['B']<5000)

index_1 = index & (df['A']=='S')
df.loc[index_1, 'B'] = df.loc[index_1, 'C'].values

index_2 = index & (df['A']=='X')
df.loc[index_2, 'B'] = (df.loc[index_2, 'C']+10).values

index_3 = index & (df['A']=='D')
df.loc[index_3, 'B'] = (df.loc[index_3, 'C']-10).values
Sign up to request clarification or add additional context in comments.

1 Comment

I think in the third and fourth expressions df.loc[index_1, 'C'] should be replaced with df.loc[index_2, 'C'] and df.loc[index_3, 'C'], respectively.
0

I don't know how elegant it is, but it works:

df2 = df.copy()

df2.B[((df2.A=='S')&(df2.B.isna()))|((df2.A=='S')&(df2.B<5000))]=df2.C
df2.B[((df2.A=='X')&(df2.B.isna()))|((df2.A=='X')&(df2.B<5000))]=df2.C+10
df2.B[((df2.A=='D')&(df2.B.isna()))|((df2.A=='D')&(df2.B<5000))]=df2.C-10

Also, make sure you work on a copy of your initial DataFrame, just in case as shown in this answer.

1 Comment

That is not true - the answer works just fine and does exactly what Egirus has been asking for.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.