Sorting pyspark dataframe accroding to columns values

Question

I am beginner in Spark and I am looking for a solution for my issue. I'm trying to sort a dataframe according to the number of null values each column contains in ascending order.

For example: data:

column1    Column2     Column3
a          d           h
b          null        null
null       e           i
null       f           h
null       null        k
c          g           l

After sorting, the dataframe should be:

Column3     Colum2     Column1

All I could do is to count each column's null values.

data.select([count(when(col(c).isNull(), c)).alias(c) for c in data.columns])

Now, I have no idea how to continue. I wish you could help me.

Does this answer your question? Python/pyspark data frame rearrange columns — sergiomahi
– sergiomahi, Commented Jan 29, 2020 at 18:58

ggeop · Accepted Answer · 2020-01-29 19:57:51Z

0

My solution, it work as you want:

#Based on your code
df=df.select([count(when(col(c).isNull(), c)).alias(c) for c in df.columns])

# Convert dataframe to dictionary (Python 3.x)
dict = list(map(lambda row: row.asDict(), df.collect()))[0]

# Create a dictionary with sorted values based on keys
sorted_dict={k: v for k, v in sorted(dict.items(), key=lambda item: item[1])}

# Create a sorted list with the column names
sorted_cols = [c for c in sorted_dict.keys()]

# With .select() method we re-order the dataframe
df.select(sorted_cols).show()

edited Jan 29, 2020 at 19:57

answered Jan 29, 2020 at 19:15

ggeop

1,40513 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mus Over a year ago

Thanks very much for replying. However, it showed an error in the dictionary line 'Unsupported class file major version 55'. I'll try to fix it. Thank you so much

ggeop Over a year ago

@Mus are you using Python 2.x ? Because my implementation is for Python 3.x

ggeop Over a year ago

For python 2.x take a look in this post: stackoverflow.com/questions/9001509/…

ggeop Over a year ago

If my answer is ok for you you can accepted if you want :-)

Mus Over a year ago

Yes I'm using Python2.7. I tried python3 and your answer works 100%. Thanks again

|

Collectives™ on Stack Overflow

Sorting pyspark dataframe accroding to columns values

1 Answer 1

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Linked

Related