Create a new column in a dataframe consisting of values from existing columns

Question

I have a dataframe which looks like this:

      X       Y   Corr_Value
  0 51182   51389   1.00
  1 51182   50014   NaN
  2 51182   50001   0.85
  3 51182   50014   NaN

I want to create a new column which consists of the values of X and Y columns. The idea is to loop through the rows, if the Corr_Value is not null , then the new column should show:

Solving (X column value) will solve (Y column value) at (Corr_value column)% probability.

for eg, for the first row the result should be:

Solving 51182 will solve 51389 with 100% probability.

This is the code I wrote:

dfs = []
for i in df1.iterrows():
    if ([df1['Corr_Value']] != np.nan):

        a = df1['X']
        b = df1['Y']
        c = df1['Corr_Value']*100
        df1['Remarks'] = (f'Solving {a} will solve {b} at {c}% probability')
        dfs.append(df1)

df1 is the dataframe which stores the X, Y and Corr_Value data.

But there seems to be a problem because the result I get looks like this:

But the result should look like this:

If you could help me get the desired result, that would be great.

jezrael · Accepted Answer · 2019-08-14 11:47:28Z

3

Use DataFrame.dropna for remove missing rows and apply f-strings for custom output string with DataFrame.apply:

f = lambda x: f'Solving {int(x["X"])} will solve {int(x["Y"])} at {int(x["Corr_Value"] * 100)}% probability.'
df['Remarks'] = df.dropna(subset=['Corr_Value']).apply(f,axis=1)
print (df)
       X      Y  Corr_Value                                            Remarks
0  51182  51389        1.00  Solving 51182 will solve 51389 at 100% probabi...
1  51182  50014         NaN                                                NaN
2  51182  50001        0.85  Solving 51182 will solve 50001 at 85% probabil...
3  51182  50014         NaN                                                NaN

edited Aug 14, 2019 at 11:47

answered Aug 14, 2019 at 11:41

jezrael

867k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

vesuvius Over a year ago

Thanks @jezrael, this solves my problem. Accepting:)

Ankur Sinha · Accepted Answer · 2019-08-14 11:49:05Z

You can also use numpy where:

import numpy as np

df['Remarks'] = np.where(df.Corr_Value.notnull(), 'Solving ' + df['X'].astype(str) + ' will solve ' + df['Y'].astype(str) + ' with ' + (df['Corr_Value'] * 100).astype(str) + '% probability', df['Corr_Value'])

Output:

       X      Y  Corr_Value                                            Remarks
0  51182  51389        1.00  Solving 51182 will solve 51389 with 100.0% pro...
1  51182  50014         NaN                                                NaN
2  51182  50001        0.85  Solving 51182 will solve 50001 with 85.0% prob...
3  51182  50014         NaN                                                NaN

Wasi · Accepted Answer · 2019-08-14 13:56:27Z

1

Just try:

dfs = []
for i, r in df1.iterrows():
    if (r['Corr_Value'] != np.nan):
        a = r['X']
        b = r['Y']
        c = r['Corr_Value']*100
        df1.at[i, 'Remarks'] = "Solving "+  str(a) + " will solve " + str(b) + " at " + str(c) + " % probability"

I think the problem is related to using df1 instead of the current row.

edited Aug 14, 2019 at 13:56

Wasi

1,5023 gold badges18 silver badges33 bronze badges

answered Aug 14, 2019 at 11:44

Andrea94c

111 bronze badge

Collectives™ on Stack Overflow

Create a new column in a dataframe consisting of values from existing columns

3 Answers 3

1 Comment

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Related