2

I've been struggling to sort the entire columns of my df, however, my code seems to be working for solely the first column ('Name') and shuffles the rest of the columns based upon the first column as shown here:

Index Name Age Education Country 
0    W    2    BS         C
1    V    1    PhD        F 
2    R    9    MA         A
3    A    8    MA         A
4    D    7    PhD        B
5    C    4    BS         C

df.sort_values(by=['Name', 'Age', 'Education', 'Country'],ascending=[True,True, True, True])

Here's what I'm hoping to get:

Index Name Age Education Country 
0     A    1    BS         A
1     C    2    BS         A 
2     D    4    MA         B
3     R    7    MA         C
4     V    8    PhD        C
5     W    9    PhD        F

Instead, I'm getting the following:

Index Name Age Education Country 
3     A    8    MA         A
5     C    4    BS         C
4     D    7    PhD        B
2     R    9    MA         A
1     V    1    PhD        F 
0     W    2    BS         C

Could you please shed some light on this issue. Many thanks in advance. Cheers, R.

4
  • 1
    Your desired result changes data. Pandas or any reputable data science tool will never allow such a result. Commented Feb 18, 2019 at 5:10
  • @Parfait Thanks for your answer. Actually, my current dataframe doesnt really look like the one I posted. The purpose of my post was to learn how to sort each and every single column independently. My actual dataframe contains columns listing Organizations of several continents. So the columns correspond to continents like "Asia" "Europe "America" ...and under each of these headers there's a list of Organizations. Now I am hoping to get each column sorted alphabetically. So in my case, these columns are totally independent and there will be no mismatching Commented Feb 19, 2019 at 1:16
  • @Parfait. Hi Parfait. I was actually wondering why Pandas wouldn't will not allow such a thing, what if one desires to look at each column or have them sorted separately? Commented Feb 19, 2019 at 1:31
  • @Riccardo...With data sets, rows mean something. They serve as distinct observations with values across diverse columns. By sorting each column independently you change the values within observations. For example, in your desired result, person A now has a different Age and Education! Commented Feb 19, 2019 at 15:09

2 Answers 2

2

Your code is sorting by name, then age, then country, etc.

To get what you want, you can do sort for each column to sort column by column. For example,

for col in df.columns:
    df[col]=sorted(df[col])

But are you sure that’s what you want to do? DataFrame is designed so that each row corresponds to a single entry, e.g. a person, and the columns corresponds to attributes like, ‘name’ and ‘age’, etc. So you don’t want sort the name and age separately so that people’s name and age get mismatched.

Sign up to request clarification or add additional context in comments.

8 Comments

This doesn't do want OP asked.
@Tim. Thanks for your answer. Actually, my current dataframe doesnt really look like the one I posted. The purpose of my post was to learn how to sort each and every single column independently. My actual dataframe contains columns listing Organizations of several continents. So the columns correspond to continets like "Asia" "Europe "America" ...and under each of these headers there's a list of Organizations. Now I am hoping to get each column sorted alphabetically. So in my case, these columns are totally independent and there will be no mismatching.
I did try your code, however, I'm getting the error below. TypeError: '<' not supported between instances of 'float' and 'str' . Once I do sorted(df['col1']) individually for the first column, it works fine, but it fails for the second column. Is that because I'm having a bunch of NaN in 2nd column?
Yes, you can do ‘sorted(df[col].astype(str))’ for columns with strings
To check if a column has strings, you can add a condition checking if df[col].dtype is object.
|
1

You can use np.sort along the 0th axis:

df[:] = np.sort(df.values, axis=0)
df

   Index Name  Age Education Country
0      0    A    1        BS       A
1      1    C    2        BS       A
2      2    D    4        MA       B
3      3    R    7        MA       C
4      4    V    8       PhD       C
5      5    W    9       PhD       F

If course, you should beware that sorting columns independently will mess the order of your columns relative to one another and render your data meaningless.

11 Comments

A huge thanks to you. Could you let me know how your line of code would look like in case one is dealing with solely String entries. :)
@Riccardo Or did you mean you only want to sort the string columns (leaving the numeric columns as-is)?
Sure I will. Once I run your colde, here's what I'm getting TypeError: '<' not supported between instances of 'float' and 'str', so I assume it may not be working for strings
@Riccardo There is a problem, your data has NaNs. Can you do something like df = df.dropna() before running this code? Or do you need to keep those rows with NaN?
Yea, right there are bunch of NaN s in there, however, don't need them. Column 1 contains 333 rows (no NaN at all in this column), Column 2 (has NaN from row 44 all the way down to the of df, ie. row 333), column 3 (contains NaN from row 27 all the way down to end which is row 333). I did try dropna method, it discarded all rows after row 27 for all 3 columns. This is not what should be happening, it actually discarded all entries below row 27 for column 1 and 2 as well.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.