AttributeError: module 'pyspark.sql.types' has no attribute 'ListType'

Question

I defined a UDF and it will return 3 values. 1 is an integer value. 2 is a float value and 3 is a list.

mylist = [8,9.5,10,11,12]
def Calculat(mylist):
  x = mylist[0]
  y = mylist[1]
  list = mylist[-3,-2,-1]
  return x, y, list

Then I want to run this to define 3 types with this code

func = F.udf(lambda x: calculate(x), T.StructType(
        [T.StructField("val1", T.IntegerType(), True),
         T.StructField("val2", T.FloatType(), True),
         T.StructField("val3", T.ListType(), True)]))

But I get this error

AttributeError: module 'pyspark.sql.types' has no attribute 'ListType'

mpSchrader · Accepted Answer · 2020-10-14 19:44:49Z

ListType is not available in Pyspark. You will need to change to ArrayType, which always needs a defined type of the elements.

func = F.udf(lambda x: calculate(x), T.StructType([
    T.StructField("val0", T.IntegerType(), True),
    T.StructField("val1", T.FloatType(), True),
    T.StructField("val2", T.ArrayType(T.IntegerType()), True),
]))

Also a small thought on my side. I really like the UDF decorator, when developing UDF functions. I really like this approach because it makes the code look much cleaner in my opinion. Your code would look as follows:

returnType=T.StructType([
    T.StructField("val0", T.IntegerType(), True),
    T.StructField("val1", T.FloatType(), True),
    T.StructField("val2", T.ArrayType(T.IntegerType()), True),
])

@F.udf(returnType=returnType)
def calculate(mylist):
  x = mylist[0]
  y = mylist[1]
  list = mylist[-3,-2,-1]
  return x, y, list

Collectives™ on Stack Overflow

AttributeError: module 'pyspark.sql.types' has no attribute 'ListType'

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related