1

I have the following data as per below:

enter image description here

I would like to expand this data to get the following data:

enter image description here

I have tried the following code:

df = (df
  .selectExpr("id", 
              "split(col1, ',') col1", 
              "split(col2, ',') col2",
              "split(col3, ',') col3")
  .withColumn("arr", F.expr("arrays_zip(col1, col2, col3)"))
  .selectExpr("id", "inline(arr)"))
  df.show()

This code result in 0 rows. So I have tried to use the command 'inline_outer':

    df = (df
  .selectExpr("id", 
              "split(col1, ',') col1", 
              "split(col2, ',') col2",
              "split(col3, ',') col3")
  .withColumn("arr", F.expr("arrays_zip(col1, col2, col3)"))
  .selectExpr("id", "inline_outer(arr)"))
  df.show()

This code brings back all the rows but without any value.

I am at this stage puzzled as to why this happens as if I expose the original column (e.g. col1 without splitting, etc.) the values are there.

Any suggestions are welcome.

1 Answer 1

2

using arrays_zip with null value returns null. For your case, we need empty array instead of null.

cnd = "arrays_zip(coalesce(col1,array()),coalesce(col2,array()),coalesce(col3,array()))"

df = df.selectExpr("id","split(col1, ',') col1","split(col2, ',') col2","split(col3, ',') col3").\
withColumn("arr", F.expr(cnd)).selectExpr("id", "inline_outer(arr)")

df.show()

+---+----+----+----+
|id |0   |1   |2   |
+---+----+----+----+
|x  |1   |a   |c   |
|x  |2   |b   |c   |
|x  |3   |c   |c   |
|y  |null|null|null|
|z  |12  |null|c   |
|z  |3   |null|null|
+---+----+----+----+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.