14

I currently have a dataframe with an id and a column which is an array of structs:

 root
 |-- id: integer (nullable = true)
 |-- lists: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- text: string (nullable = true)
 |    |    |-- amount: double (nullable = true)

Here is an example table with data:

 id | lists
 -----------
 1  | [[a, 1.0], [b, 2.0]]
 2  | [[c, 3.0]]

How do I transform the above dataframe to the one below? I need to "explode" the array and append the id at the same time.

 id | col1  | col2
 -----------------
 1  | a     | 1.0
 1  | b     | 2.0
 2  | c     | 3.0

Edited Note:

Note there is a difference between the two examples below. The first one contains "an array of structs of elements". While the later just contains "an array of elements".

 root
 |-- id: integer (nullable = true)
 |-- lists: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- text: string (nullable = true)
 |    |    |-- amount: double (nullable = true)


root
 |-- a: long (nullable = true)
 |-- b: array (nullable = true)
 |    |-- element: long (containsNull = true)
2
  • 1
    Possible duplicate of Flattening Rows in Spark Commented Feb 20, 2017 at 21:24
  • 4
    That question has a simpler dataframe where the second column is just an array. Mine differs because my second column is an "array of structs". Commented Feb 20, 2017 at 22:10

1 Answer 1

25

explode is exactly the function:

import org.apache.spark.sql.functions._

df.select($"id", explode($"lists")).select($"id", $"col.text", $"col.amount")
Sign up to request clarification or add additional context in comments.

1 Comment

I simply applied what included in the solution it works just as expected by Steve. It seperates the list into individual columns. Thanx

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.