0

I want to create a array column from existing column in PySpark

--------------------------
col0 | col1 | col2 | col3
--------------------------
1    |a     |b     |c
--------------------------
2    |d     |e     |f
--------------------------

I want like this

-------------
col0 | col1 
-------------
1    |[a,b,c]
-------------
2    |[d,e,f]
--------------

I was trying array() function like this

>>> new = df.select("col0",array("col1","col2","col3").alias("col1"))

but getting this error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

Please if anyone have solution on this ..

1
  • It Worked after I restart my pyspark.. Commented Nov 12, 2020 at 6:10

1 Answer 1

1

You need to use withColumn() first while creating a new column , post that you can use select() in order to select columns as per your choice

df = df.withColumn("col0", array("col1","col2","col3"))
df = df.select("col0")

and you are getting this error because, you are using .alias() function and the compiler is complaining about that

Sign up to request clarification or add additional context in comments.

2 Comments

alias should work on array because array returns a column. I suspect array was not pyspark.sql.functions.array but something else, but after restarting spark as in the answer below, it somehow got replaced by the correct spark array function.
Dont know .. I also tried with list() instead of array() , it gave me the same error

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.