19

I have the following df

list_columns = ['A', 'B', 'C']
list_data = [
    [1, '2', 3],
    [4, '4', 5],
    [1, '2', 3],
    [4, '4', 6]
    ]
df = pd.DataFrame(columns=list_columns, data=list_data)

I want to check if multiple columns exist, and if not to create them.

Example: If B,C,D do not exist, create them(For the above df it will create only D column) I know how to do this with one column:

if 'D' not in df:
    df['D']=0

Is there a way to test if all my columns exist, and if not create the one that are missing? And not to make an if for each column

2 Answers 2

28

Here loop is not necessary - use DataFrame.reindex with Index.union:

cols = ['B','C','D']

df = df.reindex(df.columns.union(cols, sort=False), axis=1, fill_value=0)
print (df)
   A  B  C  D
0  1  2  3  0
1  4  4  5  0
2  1  2  3  0
3  4  4  6  0
Sign up to request clarification or add additional context in comments.

3 Comments

how would you fill with a different value for each newly added column..say u have a separate array or dictionary for defaults for each column?
@mike01010 - Then use df.fillna({'A':5, 'D':4})
An alternative - perhaps simpler - approach: stackoverflow.com/a/70602986/4659442
2

Just to add, you can unpack the set diff between your columns and the list with an assign and ** unpacking.

import numpy as np
cols = ['B','C','D','E']

df.assign(**{col : 0 for col in np.setdiff1d(cols,df.columns.values)})

   A  B  C  D  E
0  1  2  3  0  0
1  4  4  5  0  0
2  1  2  3  0  0
3  4  4  6  0  0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.