Sort column names in specific order

Question

Imagine I have the following column names for a pyspark dataframe:

Naturally pyspark is ordering them by 0, 1, 2, etc. However, I wanted the following: 0_0; 0_1; 1_0; 1_1; 2_0; 2_1 OR INSTEAD 0_0; 1_0; 2_0; 3_0; 4_0; (...); 0_1; 1_1; 2_1; 3_1; 4_1 (both solutions would be fine by me).

Can anyone help me with this?

mck · Accepted Answer · 2021-01-12 13:02:21Z

1

You can sort the column names according to the number before and after the underscore:

df2 = df.select(
    'id',
    *sorted(
        df.columns[1:], key=lambda c: (int(c.split('_')[0]), int(c.split('_')[1]))
    )
)

To get the other desired output, just swap 0 with 1 in the code above.

edited Jan 12, 2021 at 13:02

answered Jan 12, 2021 at 12:58

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Johanna Over a year ago

How do I deal with the first column (id)? Since it is supposed to remain in the beggining

Collectives™ on Stack Overflow

Sort column names in specific order

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related