0

I defined a UDF and it will return 3 values. 1 is an integer value. 2 is a float value and 3 is a list.

mylist = [8,9.5,10,11,12]
def Calculat(mylist):
  x = mylist[0]
  y = mylist[1]
  list = mylist[-3,-2,-1]
  return x, y, list

Then I want to run this to define 3 types with this code

func = F.udf(lambda x: calculate(x), T.StructType(
        [T.StructField("val1", T.IntegerType(), True),
         T.StructField("val2", T.FloatType(), True),
         T.StructField("val3", T.ListType(), True)]))

But I get this error

AttributeError: module 'pyspark.sql.types' has no attribute 'ListType'

1 Answer 1

3

ListType is not available in Pyspark. You will need to change to ArrayType, which always needs a defined type of the elements.

func = F.udf(lambda x: calculate(x), T.StructType([
    T.StructField("val0", T.IntegerType(), True),
    T.StructField("val1", T.FloatType(), True),
    T.StructField("val2", T.ArrayType(T.IntegerType()), True),
]))

Also a small thought on my side. I really like the UDF decorator, when developing UDF functions. I really like this approach because it makes the code look much cleaner in my opinion. Your code would look as follows:

returnType=T.StructType([
    T.StructField("val0", T.IntegerType(), True),
    T.StructField("val1", T.FloatType(), True),
    T.StructField("val2", T.ArrayType(T.IntegerType()), True),
])

@F.udf(returnType=returnType)
def calculate(mylist):
  x = mylist[0]
  y = mylist[1]
  list = mylist[-3,-2,-1]
  return x, y, list
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.