I have a dataframe df like below
df=
+---+---+----+---+---+
| a| b| c| d| e|
+---+---+----+---+---+
| 1| a|foo1| 4| 5|
| 2| b| bar| 4| 6|
| 3| c| mnc| 4| 7|
| 4| c| mnc| 4| 7|
+---+---+----+---+---+
I want to achieve something like df1=
+---+---+-----------------------------------------------+
| a| b| c |
+---+---+-----------------------------------------------+
| 1| a|{'a': 1, 'b': 'a', 'c': 'foo1', 'd': 4, 'e': 5}|
| 2| b|{'a': 2, 'b': 'b', 'c': 'bar', 'd': 4, 'e': 6} |
| 3| c|{'a': 3, 'b': 'c', 'c': 'mnc', 'd': 4, 'e': 7} |
| 4| c|{'a': 4, 'b': 'c', 'c': 'mnc', 'd': 4, 'e': 7} |
+---+---+-----------------------------------------------+
I really wanted to avoid a group by so i thought first convert the dataframe to rdd and again convert into them one dataframe
The piece of code i have written was
df2=df.rdd.flatMap(lambda x:(x.a,x.b,x.asDict()))
while doing a foreach on df2 I am getting the result in rdd format So I tried to create a dataframe out of it.
df3=df2.toDF() #1st way
df3=sparkSession.createDataframe(df2) #2nd way
But I am getting error for both ways.Can someone explain what I am doing wrong here and how to achieve my reuriment