1

In pyspark, suppose I have dataframe with columns named as 'a1','a2','a3'...'a99', how do I apply operation on each of them to create new columns with new names dynamically?

For example, to getnew columns such as sum('a1') as 'total_a1' , ... sum('a99') as 'total_a99'.

1 Answer 1

1

You can use a list comprehension with alias.

To return only the new columns:

import pyspark.sql.functions as f
df1 = df.select(*[f.sum(c).alias("total_"+c) for c in df.columns])

And if you wanted to keep the existing columns as well:

df2 = df.select("*", *[f.sum(c).alias("total_"+c) for c in df.columns])
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.