I have a dataframe like this
data = [(('ID1', "[apples, mangos, eggs, milk, oranges]")),
(('ID1', "[eggs, milk, cereals, mangos, apples]"))]
df = spark.createDataFrame(data, ['ID', "colval"])
df.show(truncate=False)
df.printSchema()
+---+-------------------------------------+
|ID |colval |
+---+-------------------------------------+
|ID1|[apples, mangos, eggs, milk, oranges]|
|ID1|[eggs, milk, cereals, mangos, apples]|
+---+-------------------------------------+
root
|-- ID: string (nullable = true)
|-- colval: string (nullable = true)
I want to convert colval to type Array of String
And when I take the first element after split, it returns me the same result as first. Any help?
root
|-- ID: string (nullable = true)
|-- colval: array (nullable = true)
| |-- element: string (containsNull = true)
I tried using split, however end up getting this result
df = df.withColumn('colval', split('colval', "', ?'"))
df.show(truncate = False)
df.printSchema()
+---+---------------------------------------+
|ID |colval |
+---+---------------------------------------+
|ID1|[[apples, mangos, eggs, milk, oranges]]|
|ID1|[[eggs, milk, cereals, mangos, apples]]|
+---+---------------------------------------+
root
|-- ID: string (nullable = true)
|-- colval: array (nullable = true)
| |-- element: string (containsNull = true)