2

I have a dataframe like this

data = [(("ID1", {'A': 1, 'B': 2}))]
df = spark.createDataFrame(data, ["ID", "Coll"])
df.show()

+---+----------------+
| ID|            Coll|
+---+----------------+
|ID1|[A -> 1, B -> 2]|
+---+----------------+

df.printSchema()
root
 |-- ID: string (nullable = true)
 |-- Coll: map (nullable = true)
 |    |-- key: string
 |    |-- value: long (valueContainsNull = true)

I want to explode the 'Coll' column such that

+---+-----------+
| ID| Key| Value|
+---+-----------+
|ID1|   A|     1|
|ID1|   B|     2| 
+---+-----------+

I am trying to do this in pyspark

I am successful if I use only one column, however I want the ID column as well

df.select(explode("Coll").alias("x", "y")).show()

+---+---+
|  x|  y|
+---+---+
|  A|  1|
|  B|  2|
+---+---+

1 Answer 1

6

Simply add the ID column to the select and it should work:

df.select("id", explode("Coll").alias("x", "y"))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.