1

The idea is to sort value by two columns. Such that, given two column, I am expecting the output something like

Expected output

      x     y
0   2.0   NaN
1   3.0   NaN
2   4.0   4.1
3   NaN   5.0
4  10.0   NaN
5  24.0  24.7
6  31.0  31.4

However, using the code below

import pandas as pd
import numpy as np
df1 = pd.DataFrame ( {'x': [2, 3, 4, 24, 31, '',10],
                      'y':['','',4.1,24.7,31.4,5,'']} )
df1.replace(r'^\s*$', np.nan, regex=True,inplace=True)
rslt_df = df1.sort_values ( by=['x', 'y'], ascending=(True, True) )

print(rslt_df)

Produce the following

      x     y
0   2.0   NaN
1   3.0   NaN
2   4.0   4.1
6  10.0   NaN
3  24.0  24.7
4  31.0  31.4
5   NaN   5.0

Notice that at the last row, the 5.0 of column y is placed at the bottom.

May I know what modification to the code in order to obtained the intended output?

1
  • The reason why is because it's sorting by X (nan goes to bottom), then Y. Commented Jun 17, 2021 at 16:43

2 Answers 2

3

Try sorting by x fillna y, then reindex from those sorted values:

df1.reindex(df1['x'].fillna(df1['y']).sort_values().index).reset_index(drop=True)

To update the df1 variable:

df1 = (
    df1.reindex(df1['x'].fillna(df1['y']).sort_values().index)
        .reset_index(drop=True)
)

df1:

      x     y
0   2.0   NaN
1   3.0   NaN
2   4.0   4.1
3   NaN   5.0
4  10.0   NaN
5  24.0  24.7
6  31.0  31.4
Sign up to request clarification or add additional context in comments.

Comments

2

with np.sort and argsort:

df1.iloc[np.sort(df1[['x','y']],axis=1)[:,0].argsort()]

      x     y
0   2.0   NaN
1   3.0   NaN
2   4.0   4.1
5   NaN   5.0
6  10.0   NaN
3  24.0  24.7
4  31.0  31.4

2 Comments

This does exactly what the OP intend to with the advantage of being more compact.
@HenryEcker, I think you should maintain your post. I learn somehting from there.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.