3

this should be quite simple but I still didn't find a way. I have to compute a new column with a value of maximum of columns col1 and col2. So if col1 is 2 and col2 is 4, the new_col should have 4. And so on. It's in a Pyspark dataframe. I tried df=df.withColumn("new_col",max("col1","col2")), but got the error "_() takes 1 positional argument but 2 were given". So what would be the correct way? Thanks in advance.

1 Answer 1

13

you can try with greatest:

from pyspark.sql import functions as F
output = df.withColumn("new_col", F.greatest("col1","col2"))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.