3

I have a dataframe as shown below:

    Category    1   2   3   4   5   6   7   8   9   10  11  12  13
    A   424 377 161 133 2   81  141 169 297 153 53  50  197
    B   231 121 111 106 4   79  68  70  92  93  71  65  66
    C   480 379 159 139 2   116 148 175 308 150 98  82  195
    D   88  56  38  40  0   25  24  55  84  36  24  26  36
    E   1084    1002    478 299 7   256 342 342 695 378 175 132 465
    F   497 246 283 206 4   142 151 168 297 224 194 198 148
    H   8   5   4   3   0   2   3   2   7   5   3   2   0
    G   3191    2119    1656    856 50  826 955 739 1447    1342    975 628 1277
    K   58  26  27  51  1   18  22  42  47  35  19  20  14
    S   363 254 131 105 6   82  86  121 196 98  81  57  125
    T   54  59  20  4   0   9   12  7   36  23  5   4   20
    O   554 304 207 155 3   130 260 183 287 204 98  106 195
    P   756 497 325 230 5   212 300 280 448 270 201 140 313
    PP  64  43  26  17  1   15  35  17  32  28  18  9   27
    R   265 157 109 89  1   68  68  104 154 96  63  55  90
    S   377 204 201 114 5   112 267 136 209 172 147 90  157
    St  770 443 405 234 5   172 464 232 367 270 290 136 294
    Qs  47  33  11  14  0   18  14  19  26  17  5   6   13
    Y   1806    626 1102    1177    14  625 619 1079    1273    981 845 891 455
    W   123 177 27  28  0   18  62  34  64  27  14  4   51
    Z   2770    1375    1579    1082    17  900 1630    1137    1465    1383    861 755 1201

I want to sort the dataframe by values in each row. Once done, I want to sort the index also.

For example the values in first row corresponding to category A, should appear as: 2 50 53 81 133 141 153 161 169 197 297 377 424

I have tried df.sort_values(by=df.index.tolist(), ascending=False, axis=1) but this doesn't work. The values don't appear in sorted order at all

2
  • 1
    Edited the question. Commented Nov 21, 2018 at 16:57
  • Possibly you can use df.sort_values(['c1','c2'], ascending=False) but in your case you have multiple columns which is little tricky Commented Nov 21, 2018 at 17:18

2 Answers 2

2

np.sort + sort_index

You can use np.sort along axis=1, then sort_index:

cols, idx = df.columns[1:], df.iloc[:, 0]

res = pd.DataFrame(np.sort(df.iloc[:, 1:].values, axis=1), columns=cols, index=idx)\
        .sort_index()

print(res)

           1    2    3    4    5     6     7     8     9    10    11    12  \
Category                                                                     
A          2   50   53   81  133   141   153   161   169   197   297   377   
B          4   65   66   68   70    71    79    92    93   106   111   121   
C          2   82   98  116  139   148   150   159   175   195   308   379   
D          0   24   24   25   26    36    36    38    40    55    56    84   
E          7  132  175  256  299   342   342   378   465   478   695  1002   
F          4  142  148  151  168   194   198   206   224   246   283   297   
G         50  628  739  826  856   955   975  1277  1342  1447  1656  2119   
H          0    0    2    2    2     3     3     3     4     5     5     7   
K          1   14   18   19   20    22    26    27    35    42    47    51   
O          3   98  106  130  155   183   195   204   207   260   287   304   
P          5  140  201  212  230   270   280   300   313   325   448   497   
PP         1    9   15   17   17    18    26    27    28    32    35    43   
Qs         0    5    6   11   13    14    14    17    18    19    26    33   
R          1   55   63   68   68    89    90    96   104   109   154   157   
S          6   57   81   82   86    98   105   121   125   131   196   254   
S          5   90  112  114  136   147   157   172   201   204   209   267   
St         5  136  172  232  234   270   290   294   367   405   443   464   
T          0    4    4    5    7     9    12    20    20    23    36    54   
W          0    4   14   18   27    27    28    34    51    62    64   123   
Y         14  455  619  625  626   845   891   981  1079  1102  1177  1273   
Z          1   17  755  861  900  1082  1137  1375  1383  1465  1579  1630   
Sign up to request clarification or add additional context in comments.

5 Comments

Because it's designed to work row-wise, i.e. sort rows up and down for all columns, not sorting columns left and right for each row. This is expected with Pandas, where data is stored in columnar series.
@mlRocks, Yup, probably. The docs are open source, you can make a commit on github if you wish.
The values are sorted but I am getting the column names in wrong order
@mlRocks, What do you expect for column names? The order will be different for each row, right?
Yeap. My bad!. Thanks
2

One way is to apply sorted setting 1 as axis, applying pd.Series to return a dataframe instead of a list, and finally sorting by Category:

df.loc[:,'1':].apply(sorted, axis = 1).apply(pd.Series)
           .set_index(df.Category).sort_index()



       Category   0    1    2    3     4     5     6     7     8     9    10  ...
0         A   2   50   53   81   133   141   153   161   169   197   297   ...
1         B   4   65   66   68    70    71    79    92    93   106   111  ...

2 Comments

This works, but apply(pd.Series) is a Python-level loop and will be slow for larger dataframes.
Yes, it is not ideal for big dataframes, as you say. First solution that came across my mind :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.