0

I have a schema:

root (original)
 |-- entries: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- col1: string (nullable = false)
 |    |    |-- col2: string (nullable = true) 

How can I flatten it?

root (derived)
 |-- col1: string (nullable = false)
 |-- col2: string (nullable = true)
 |-- col3: string (nullable = false)
 |-- col4: string (nullable = true)
 |-- ...

where col1...n is [col1 from original] and value for col1...n is value from [col2 from original]

Example:

+--------------------------------------------+
|entries                                     |
+--------------------------------------------+
|[[a1, 1], [a2, P], [a4, N]                  |
|[[a1, 1], [a2, O], [a3, F], [a4, 1], [a5, 1]|
+--------------------------------------------+

I want to create the next dataset:

+-------------------------+
| a1 | a2 | a3  | a4 | a5 |
+-------------------------+
| 1  | P  | null| N | null|
| 1  | O  | F   | 1 | 1   |
+-------------------------+
2
  • 1
    You can use explode. df.withColumn("entries", explode("entries")).select("entries.*").show Commented Mar 11, 2020 at 7:37
  • 1
    explode will create 2 columns (col1, col2). Commented Mar 11, 2020 at 7:49

1 Answer 1

3

You can do it with a combination of explode and pivot, to do so, one needs to create a row_id first:

val df = Seq(
  Seq(("a1", "1"), ("a2", "P"), ("a4", "N")),
  Seq(("a1", "1"), ("a2", "O"), ("a3", "F"), ("a4", "1"), ("a5", "1"))
).toDF("arr")
  .select($"arr".cast("array<struct<col1:string,col2:string>>"))

df
  .withColumn("row_id", monotonically_increasing_id())
  .select($"row_id", explode($"arr"))
  .select($"row_id", $"col.*")
  .groupBy($"row_id").pivot($"col1").agg(first($"col2"))
  .drop($"row_id")
  .show()

gives:

+---+---+----+---+----+
| a1| a2|  a3| a4|  a5|
+---+---+----+---+----+
|  1|  P|null|  N|null|
|  1|  O|   F|  1|   1|
+---+---+----+---+----+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.