4

I'd like to explode an array of structs to columns (as defined by the struct fields). E.g.

root
 |-- arr: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: long (nullable = false)
 |    |    |-- name: string (nullable = true)

Should be transformed to

root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)

I can achieve this with

df
  .select(explode($"arr").as("tmp"))
  .select($"tmp.*")

How can I do that in a single select statement?

I thought this could work, unfortunately it does not:

df.select(explode($"arr")(".*"))

Exception in thread "main" org.apache.spark.sql.AnalysisException: No such struct field .* in col;

1 Answer 1

2

Single step solution is available only for MapType columns:

val df = Seq(Tuple1(Map((1L, "bar"), (2L, "foo")))).toDF

df.select(explode($"_1") as Seq("foo", "bar")).show

+---+---+
|foo|bar|
+---+---+
|  1|bar|
|  2|foo|
+---+---+

With arrays you can use flatMap:

val df = Seq(Tuple1(Array((1L, "bar"), (2L, "foo")))).toDF
df.as[Seq[(Long, String)]].flatMap(identity)

A single SELECT statement can written in SQL:

 df.createOrReplaceTempView("df")

spark.sql("SELECT x._1, x._2 FROM df LATERAL VIEW explode(_1) t AS x")
Sign up to request clarification or add additional context in comments.

1 Comment

the first solution with Map doesn't match with the O/P's schema and the second solution is similar to using two selects that the O/P already has it implemented. Isn't thats so?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.