Create a new column from another column in Python

Question

I have a pandas dataframe in python, let's call it df

In this dataframe I create a new column based on an exist column as follows:

df.loc[:, 'new_col'] = df['col']

Then I do the following:

df[df['new_col']=='Above Average'] = 'Good'

However, I noticed that this operation also changes the values in df['col']

What should I do in order the values in df['col'] not to be affected by operations I do in df['new_col'] ?

I tried and does not work

msh855
– msh855

2019-05-14 09:01:10 +00:00
Commented May 14, 2019 at 9:01 — msh855
– msh855, Commented May 14, 2019 at 9:01

jezrael · Accepted Answer · 2019-05-14 09:12:08Z

Use DataFrame.loc with boolean indexing:

df.loc[df['new_col']=='Above Average', 'new_col'] = 'Good'

If no column is specified, all columns are set to Good by condition.

Also both line of code should be changed to one by numpy.where or Series.mask:

df['new_col'] = np.where(df['new_col']=='Above Average', 'Good', df['col'])

df['new_col'] = df['col'].mask(df['new_col']=='Above Average', 'Good')

EDIT: For change many values use Series.replace or Series.map with dictionary for specified values:

d = {'Good':['Above average','effective'], 'Very Good':['Really effective']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Above average': 'Good', 'effective': 'Good', 'Really effective': 'Very Good'}

df['new_col'] = df['col'].replace(d1)
#if large data obviously better performance
df['new_col'] = df['col'].map(d1).fillna(df['col'])

And what if I have multiple conditions say, change Above average and effective both to Good, and secondly what if I have other cases ? say Really effective to become 'very Good', as an example.

prosti · Accepted Answer · 2019-05-14 13:42:14Z

There is also an option to use dataframe where method:

df['new_col'] = df['col']
df['new_col'].where(df['new_col']!='Above Average', other='Good', inplace=True )

But to be clear np.where is the fastest way to go:

m = df['col'] == 'Above Average'
df['new_column'] = np.where(m, 'Good', df['col'])

df['new_column'] is the new column name. If mask m is True df['col'] will be assigned else 'Good'.

+----+---------------+
|    | col           |
|----+---------------|
|  0 | Nan           |
|  1 | Above Average |
|  2 | 1.0           |
+----+---------------+
+----+---------------+--------------+
|    | col           | new_column   |
|----+---------------+--------------|
|  0 | Nan           | Nan          |
|  1 | Above Average | Good         |
|  2 | 1.0           | 1.0          |
+----+---------------+--------------+

I am also providing here some notes on masking when using the df.loc:

m = df['col']=='Above Average'
print(m)
df.loc[m, 'new_column'] = 'Good'

As you may see the result will be the same, but note how mask m is having the information where to read the value if m is False

Collectives™ on Stack Overflow

Create a new column from another column in Python

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related