1

I have created a data frame df1 like below,

data = {'ID':[1,2,3,4,5,6,7,8,9,10],
        'date_1':['2021-03-01','2021-03-02','2021-04-03','2021-03-04','2021-03-05','2021-03-06','2021-03-07','2021-03-08','2021-03-09','2021-03-10'],
        'date_2': ['2021-03-06','2021-03-07','2021-03-08','2021-03-09','2021-03-10','2021-03-11','2021-03-12','2021-03-13','2021-03-14','2021-03-15']
       }
df1 = pd.DataFrame(data, columns = ['ID','date_1','date_2'])
df1

This is the df1 output enter image description here

I am trying to create a new dataframe df2 with just one column 'date_3' from df1. The column 'date_3' in df2 ideally should be returning just the rows(dates) from df1 which meet the condition of the below statement (True),

df1['date_1'] <= df1['date_2']

Below is my approach but I am just getting the conditional output (True/False) and the not the actual date values,

data = [df1['date_1'] <= df1['date_2']]
headers = ['date_3']
df2 = pd.concat(data, axis=1, keys=headers)
df2

This is the output of df2enter image description here

4
  • 3
    Use: df[df['date_1'] <= df['date_2']] Commented Mar 25, 2021 at 16:46
  • 1
    Which date you want to show in date_3? date_1 or date_2? Commented Mar 25, 2021 at 16:47
  • 1
    I would want date_1 to show in date_3 which meets the condition Commented Mar 25, 2021 at 16:49
  • 1
    df2 = df1.loc[df1['date_1'] <= df1['date_2'], ['date_1']] Commented Mar 25, 2021 at 16:50

2 Answers 2

1

Use:

In [489]: df2 = df[df['date_1'] <= df['date_2']]['date_1'].to_frame('date_3')

In [490]: df2
Out[490]: 
       date_3
0  2021-03-01
1  2021-03-02
3  2021-03-04
4  2021-03-05
5  2021-03-06
6  2021-03-07
7  2021-03-08
8  2021-03-09
9  2021-03-10

As advised by @ScottBoston, avoiding chain indexing:

df2 = df.loc[df['date_1'] <= df['date_2'], 'date_1'].to_frame('date_3')
Sign up to request clarification or add additional context in comments.

2 Comments

@MayankPorwal Can I make a suggestion? We should avoid chained indexing. Whenever you have '][' in your statements for pandas this is a hint of chained indexing and should be rewritten using loc. Please consider using df2 = df1.loc[df1['date_1'] <= df1['date_2'], ['date_1']] or df2 = df1.loc[df1['date_1'] <= df1['date_2'], 'date_1'].to_frame('date_3')
@ScottBoston Thanks for your advice. I've updated my answer with the same.
1

This:

df2 = df.loc[df["date_1"]<= df["date_2"], ["ID", "date_1"]].copy()

df2.rename(columns= {"date_1": "date_3"})

will first subset based on your condition and only keep the ID and date_1 column, then you can rename the column

It also makes it explicit that you get a copy and will prevent you from getting any setWithCopyWarnings if you make any modifications

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.