I have a table as below:
ID String
1 a,b,c
2 b,c,a
3 c,a,b
I want to sort the String as a,b,c, so I can groupby ID and String, and ID 1,2,3 will be groupby together
is there any way to sort the multiple value in one string? like below
ID String String2
1 a,b,c a,b,c
2 b,c,a a,b,c
3 c,a,b a,b,c
df2 = df.withColumn('String2', ','.join(sorted(df.String.split(',')))) is having errors, where it went wrong?
Thanks to everyone who contribute this post, the correct code is posted in below
import pyspark.sql.functions as F
array_sort_udf = F.udf(sorted, 'array<string>')
df2 = df\
.withColumn("String2", F.concat_ws(",", array_sort_udf(F.split("String", ","))))