0

df=spark.sql("select key, name, subjects from table")

df in from above select statement :

key name    subjects
12  x,y,z   1,2,3
20  a,b     8,7

df out :

12  x 1
12  y 2
12  z 3
20  a 8
20  b 7

tried converting to list , explode. Still throwing error. pls help the efficient way to achieve this ?

1

2 Answers 2

2

One way using pandas.DataFrame.apply:

# df["name"] = df["name"].str.split(",")
# df["subjects"] = df["subjects"].str.split(",")
# If not already split

new_df = df.apply(pd.Series.explode)
print(new_df)

Output:

   key name subjects
0   12    x        1
0   12    y        2
0   12    z        3
1   20    a        8
1   20    b        7
Sign up to request clarification or add additional context in comments.

Comments

0

Thanks chris. It is getting exploded. Still facing the error - Cannot reindex from a duplicate axis. Concat with ignore index is not working .Is it possible to generate temp unique indexes as key is duplicated during explode. pandasversion -1.0.5

df["name"] = df["name"].str.split(",") 
df["subjects"] = df["subjects"].str.split(",") 
new_df= df.apply(pd.Series.explode).reindex() 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.