Create new df from existing df (python - pandas)

Question

I have created a data frame df1 like below,

data = {'ID':[1,2,3,4,5,6,7,8,9,10],
        'date_1':['2021-03-01','2021-03-02','2021-04-03','2021-03-04','2021-03-05','2021-03-06','2021-03-07','2021-03-08','2021-03-09','2021-03-10'],
        'date_2': ['2021-03-06','2021-03-07','2021-03-08','2021-03-09','2021-03-10','2021-03-11','2021-03-12','2021-03-13','2021-03-14','2021-03-15']
       }
df1 = pd.DataFrame(data, columns = ['ID','date_1','date_2'])
df1

This is the df1 output

I am trying to create a new dataframe df2 with just one column 'date_3' from df1. The column 'date_3' in df2 ideally should be returning just the rows(dates) from df1 which meet the condition of the below statement (True),

df1['date_1'] <= df1['date_2']

Below is my approach but I am just getting the conditional output (True/False) and the not the actual date values,

data = [df1['date_1'] <= df1['date_2']]
headers = ['date_3']
df2 = pd.concat(data, axis=1, keys=headers)
df2

This is the output of df2

Which date you want to show in date_3? date_1 or date_2? — Mayank Porwal
– Mayank Porwal, Commented Mar 25, 2021 at 16:47
I would want date_1 to show in date_3 which meets the condition — Jude92
– Jude92, Commented Mar 25, 2021 at 16:49

Mayank Porwal · Accepted Answer · 2021-03-26 04:57:45Z

1

Use:

In [489]: df2 = df[df['date_1'] <= df['date_2']]['date_1'].to_frame('date_3')

In [490]: df2
Out[490]: 
       date_3
0  2021-03-01
1  2021-03-02
3  2021-03-04
4  2021-03-05
5  2021-03-06
6  2021-03-07
7  2021-03-08
8  2021-03-09
9  2021-03-10

As advised by @ScottBoston, avoiding chain indexing:

df2 = df.loc[df['date_1'] <= df['date_2'], 'date_1'].to_frame('date_3')

edited Mar 26, 2021 at 4:57

answered Mar 25, 2021 at 16:51

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Scott Boston Over a year ago

@MayankPorwal Can I make a suggestion? We should avoid chained indexing. Whenever you have '][' in your statements for pandas this is a hint of chained indexing and should be rewritten using loc. Please consider using df2 = df1.loc[df1['date_1'] <= df1['date_2'], ['date_1']] or df2 = df1.loc[df1['date_1'] <= df1['date_2'], 'date_1'].to_frame('date_3')

Mayank Porwal Over a year ago

@ScottBoston Thanks for your advice. I've updated my answer with the same.

Stryder · Accepted Answer · 2021-03-25 16:57:26Z

1

This:

df2 = df.loc[df["date_1"]<= df["date_2"], ["ID", "date_1"]].copy()

df2.rename(columns= {"date_1": "date_3"})

will first subset based on your condition and only keep the ID and date_1 column, then you can rename the column

It also makes it explicit that you get a copy and will prevent you from getting any setWithCopyWarnings if you make any modifications

answered Mar 25, 2021 at 16:57

Stryder

8707 silver badges12 bronze badges

Collectives™ on Stack Overflow

Create new df from existing df (python - pandas)

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related