Create columns in pyspark df from a list if the column doesn't already exist

Question

Here is a list of values, I would like my dataframe to have :

 cols=['USA','CAN','UK','DEN']

My current df:

| ID | USA | DEN | VEN | NOR|
| 98 |  1  |  0  | 1   |  1 |
| 99 |  0  |  1  | 0   |  0 |

I want to check if my existing df has all the values in the list as columns, if not then create those columns and fill then with 0 like:

| ID | USA | DEN | VEN | NOR| CAN | UK|
| 98 |  1  |  0  | 1   |  1 |  0  | 0 |
| 99 |  0  |  1  | 0   |  0 |  0  | 0 |

notNull · Accepted Answer · 2020-03-05 03:53:22Z

2

Try with for + if loop to check if column exists in df.columns or else add column with 0.

from pyspark.sql.functions import *

df=spark.createDataFrame([(98,1,0,1,1,)],['ID','USA','DEN','VEN','NOR'])
cols=['USA','CAN','UK','DEN']

for i in cols:
     if not i in df.columns:
        df=df.withColumn(i,lit("0"))

df.show()

#+---+---+---+---+---+---+---+
#| ID|USA|DEN|VEN|NOR|CAN| UK|
#+---+---+---+---+---+---+---+
#| 98|  1|  0|  1|  1|  0|  0|
#+---+---+---+---+---+---+---+

answered Mar 5, 2020 at 3:53

notNull

31.8k4 gold badges41 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Faliha Zikra Over a year ago

For some reason, it doesn't create columns for all the missing elements in cols. It created only one extra column.

michalrudko Over a year ago

Could you post a test case which fails? I added 2 more items to cols and the solution above worked.

blackbishop · Accepted Answer · 2020-03-05 20:50:35Z

1

You can use a simple select expression :

from pyspark.sql.functions import lit

select_cols = df.columns + [lit(0).alias(c) for c in cols if c not in df.columns]

df.select(*select_cols).show()

answered Mar 5, 2020 at 20:50

blackbishop

32.8k11 gold badges61 silver badges86 bronze badges

Collectives™ on Stack Overflow

Create columns in pyspark df from a list if the column doesn't already exist

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related