Data:
Name1 Name2 Name3(Expected)
RR Industries null RR Industries
RR Industries RR Industries RR IndustriesRR Industries
Code:
.withColumn("Name3",F.concat(F.trim(Name1), F.trim(Name2)))
Actual result: The columns with null values are deleted. I want the output to be as seen in Name3(Expected Columnt)
I think, the issue occurs after the joining the tables The name column is available in df2 and df3. before joining they do not contain null values.
Issue: After joining; since pyspark doesnt delete the common columns, we have two name1 columns from 2 tables I tried replcaing it with empty string;it didnt work and throws error
How do I replace null values with empty string after joining tables
df = df1\
.join(df2,"code",how = 'left') \
.join(df3,"id",how = 'left')\
.join(df4,"id",how = 'left')\
.withColumn('name1',F.when(df2('name1').isNull(),'').otherwise(df2('name1')))\
.withColumn('name1',F.when(df3('name1').isNull(),'').otherwise(df3('name1')))\
.withColumn("Name1",F.concat(F.trim(df2.name1), F.trim(df3.name1)))
coalesceto replace the null values with an empty string, and use that for your concat.