0

Imagine I have the following column names for a pyspark dataframe:

enter image description here

Naturally pyspark is ordering them by 0, 1, 2, etc. However, I wanted the following: 0_0; 0_1; 1_0; 1_1; 2_0; 2_1 OR INSTEAD 0_0; 1_0; 2_0; 3_0; 4_0; (...); 0_1; 1_1; 2_1; 3_1; 4_1 (both solutions would be fine by me).

Can anyone help me with this?

1 Answer 1

1

You can sort the column names according to the number before and after the underscore:

df2 = df.select(
    'id',
    *sorted(
        df.columns[1:], key=lambda c: (int(c.split('_')[0]), int(c.split('_')[1]))
    )
)

To get the other desired output, just swap 0 with 1 in the code above.

Sign up to request clarification or add additional context in comments.

1 Comment

How do I deal with the first column (id)? Since it is supposed to remain in the beggining

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.