How to add rows only if it doesn't exists in the df

Question

I have a list which looks like this:

[2,4,3,1,5]  # One to Five in a certain order

I have a df that looks like this

ID    Name    Address
1      A         ABC
3      B         DEF

I want to add which ever IDs are missing with the values being False for Name and Address (there are a lot of columns towards to right, I don't want to manually give false for it, so if there is a method that will fill False for rest of the columns then its better)

After adding it, I want the DF to be sorted based on the order of the list.

Expected result:

ID   Name    Address
2    False      False
4    False      False
3    B          DEF
1    A          ABC
5    False      False

Can you please show what you at least tried? There are multiple ways of doing this — roganjosh
– roganjosh, Commented Aug 20, 2022 at 19:02
I am pretty new to pandas, just learning things. I couldn't do much — derikS4M1
– derikS4M1, Commented Aug 20, 2022 at 19:03
What steps do you think might be relevant, though? If you don't try to think your way through the problem then the answer (which I predict will come within 2 minutes) doesn't make you consider anything and diminishes from your learning — roganjosh
– roganjosh, Commented Aug 20, 2022 at 19:04
I think pd.concat will work after converting the list to a df. But how to only take the needed rows and fill in False for others + sorting it. i have no clue — derikS4M1
– derikS4M1, Commented Aug 20, 2022 at 19:13

roganjosh · Accepted Answer · 2022-08-20 19:25:57Z

Getting your initial df:

import pandas as pd

df = pd.DataFrame({'ID': [1, 3],
                   'Name': ['A', 'B'],
                   'Address': ['ABC', 'DEF']})
print(df.head()

It's useful to note here that an integer index is made for you. The way you list your df doesn't show such an index. The required output suggests that 'ID' is the index but it's not totally clear how you want the output in that format.

The first thing to do is set the index as 'ID':

df = df.set_index('ID')

Then you can reindex() that to build a backbone of the resulting df, which will fill the remaining columns with NaN.

df = df.set_index('ID').reindex(index=lst)

Gives:

  Name Address
ID             
2   NaN     NaN
4   NaN     NaN
3     B     DEF
1     A     ABC
5   NaN     NaN

(Note that 'ID', being the index, is now on a different level to the other column names)

The final step is to .fillna() with False

Putting it together:

df = df.set_index('ID').reindex(index=lst).fillna(False).reset_index()

The last reset_index() is optional because, as I said, you don't appear to have an independent index in your expected output.

Nuri Taş · Accepted Answer · 2022-08-20 19:23:40Z

1

One way to do this:

your_list = [2,4,3,1,5]
df1 = pd.DataFrame({'Name':[False]*len(your_list), 'Address':[False]*len(your_list)}, index=your_list)

df1.loc[df.index] = df

where df is the sample dataframe with two rows.

answered Aug 20, 2022 at 19:23

Nuri Taş

3,8552 gold badges8 silver badges22 bronze badges

Collectives™ on Stack Overflow

How to add rows only if it doesn't exists in the df

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related