Sorting columns in pandas dataframe based on column name

Question

I have a dataframe with over 200 columns. The issue is as they were generated the order is

['Q1.3','Q6.1','Q1.2','Q1.1',......]

I need to sort the columns as follows:

['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]

Is there some way for me to do this within Python?

The question has a banner at the top "This question already has answers here: How to change the order of DataFrame columns? (34 answers) Closed last year." The question that it is saying is the same is a totally different question and this banner and link should therefore be removed. — Joey
– Joey, Commented Aug 20, 2020 at 13:20
I am voting to reopen this question, I believe it has been erroneously marked as duplicate: the supplied duplicate asks how to reorder columns whereas this question asks how to sort by column name. Strictly speaking answers to the latter are a subset of the former, but users seeking an answer to the latter are unlikely to find it in the answers to the duplicate (the highest-voted answer which mentions sorting is currently 5th in vote total). — William Miller
– William Miller, Commented Feb 1, 2022 at 0:59
I'm in complete agreement, the linked question is completely different. Why nobody will agree to reopen it is beyond me. — fantabolous
– fantabolous, Commented Jun 4, 2023 at 13:06

gcamargo · Accepted Answer · 2018-12-18 22:57:40Z

668

df = df.reindex(sorted(df.columns), axis=1)

This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.

edited Dec 18, 2018 at 22:57

gcamargo

4,0415 gold badges26 silver badges35 bronze badges

answered Jun 16, 2012 at 21:12

BrenBarn

253k39 gold badges421 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Nate Anderson Over a year ago

I like this because the same method can be used to sort rows (I needed to sort rows and columns). While it's the same method, you can omit the axis argument (or provide its default value, 0), like df.reindex_axis(sorted(non_sorted_row_index)) which is equivalent to df.reindex(sorted(non_sorted_row_index))

WhoIsJack Over a year ago

Note that re-indexing is not done in-place, so to actually apply the sort to the df you have to use df = df.reindex_axis(...). Also, note that non-lexicographical sorts are easy with this approach, since the list of column names can be sorted separately into an arbitrary order and then passed to reindex_axis. This is not possible with the alternative approach suggested by @Wes McKinney (df = df.sort_index(axis=1)), which is however cleaner for pure lexicographical sorts.

CodingMatters Over a year ago

not sure when '.reindex_axis' was deprecated, see message below. FutureWarning: '.reindex_axis' is deprecated and will be removed in a future version. Use '.reindex' instead. This is separate from the ipykernel package so we can avoid doing imports until

Logan Over a year ago

reindex_axis is deprecated and results in FutureWarning. However, .reindex works fine. For the above example, use df.reindex(columns=sorted(df.columns))

Hedge92 Over a year ago

This is a good solution, but does not work if you have duplicate column names. The answer of @Wes McKinney works in that case. Hence, I think df.sort_index(axis=1) is the most appropriate solution.

|

cs95 · Accepted Answer · 2019-01-21 21:02:53Z

499

You can also do more succinctly:

df.sort_index(axis=1)

Make sure you assign the result back:

df = df.sort_index(axis=1)

Or, do it in-place:

df.sort_index(axis=1, inplace=True)

edited Jan 21, 2019 at 21:02

cs95

406k106 gold badges744 silver badges794 bronze badges

answered Jul 8, 2012 at 18:56

Wes McKinney

106k32 gold badges146 silver badges109 bronze badges

3 Comments

GoJian Over a year ago

remember to do df = df.sort_index(axis=1), per @multigoodverse

jkr Over a year ago

or modify df in-place with df.sort_index(axis=1, inplace=True)

ExtractTable.com Over a year ago

also, sort_index is faster than reindex, in case devs worry about it

cs95 · Accepted Answer · 2019-01-27 06:19:22Z

77

You can just do:

df[sorted(df.columns)]

Edit: Shorter is

df[sorted(df)]

edited Jan 27, 2019 at 6:19

cs95

406k106 gold badges744 silver badges794 bronze badges

answered Jun 24, 2014 at 21:22

Ivelin

13.5k6 gold badges40 silver badges37 bronze badges

3 Comments

multigoodverse Over a year ago

I get "'DataFrame' object is not callable" for this. Version: pandas 0.14.

zyxue Over a year ago

@lvelin, do you know why sorted(df) works, is it documented somewhere?

Ivelin Over a year ago

@zyxue, sorted will be looking for the iterative class magic methods to figure out what to sort. Take a look at this question stackoverflow.com/questions/48868228/…

Myeongsik Joo · Accepted Answer · 2016-03-11 05:54:35Z

For several columns, You can put columns order what you want:

#['A', 'B', 'C'] <-this is your columns order
df = df[['C', 'B', 'A']]

This example shows sorting and slicing columns:

d = {'col1':[1, 2, 3], 'col2':[4, 5, 6], 'col3':[7, 8, 9], 'col4':[17, 18, 19]}
df = pandas.DataFrame(d)

You get:

col1  col2  col3  col4
 1     4     7    17
 2     5     8    18
 3     6     9    19

Then do:

df = df[['col3', 'col2', 'col1']]

Resulting in:

col3  col2  col1
7     4     1
8     5     2
9     6     3

Community · Accepted Answer · 2017-05-23 12:34:45Z

Tweet's answer can be passed to BrenBarn's answer above with

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

So for your example, say:

vals = randint(low=16, high=80, size=25).reshape(5,5)
cols = ['Q1.3', 'Q6.1', 'Q1.2', 'Q9.1', 'Q10.2']
data = DataFrame(vals, columns = cols)

You get:

data

    Q1.3    Q6.1    Q1.2    Q9.1    Q10.2
0   73      29      63      51      72
1   61      29      32      68      57
2   36      49      76      18      37
3   63      61      51      30      31
4   36      66      71      24      77

Then do:

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

resulting in:

data


     Q1.2    Q1.3    Q6.1    Q9.1    Q10.2
0    2       0       1       3       4
1    7       5       6       8       9
2    2       0       1       3       4
3    2       0       1       3       4
4    2       0       1       3       4

This should be the accepted answer. Replace reindex_axis with reindex for pandas >= 1.0

approxiblue · Accepted Answer · 2015-11-05 22:08:35Z

21

If you need an arbitrary sequence instead of sorted sequence, you could do:

sequence = ['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]
your_dataframe = your_dataframe.reindex(columns=sequence)

I tested this in 2.7.10 and it worked for me.

edited Nov 5, 2015 at 22:08

approxiblue

7,16216 gold badges53 silver badges59 bronze badges

answered Nov 5, 2015 at 21:48

M.Z

2531 gold badge4 silver badges9 bronze badges

Comments

burkesquires · Accepted Answer · 2014-12-08 15:33:31Z

16

Don't forget to add "inplace=True" to Wes' answer or set the result to a new DataFrame.

df.sort_index(axis=1, inplace=True)

answered Dec 8, 2014 at 15:33

burkesquires

1,5551 gold badge17 silver badges20 bronze badges

Comments

multigoodverse · Accepted Answer · 2015-01-29 12:37:26Z

4

The quickest method is:

df.sort_index(axis=1)

Be aware that this creates a new instance. Therefore you need to store the result in a new variable:

sortedDf=df.sort_index(axis=1)

answered Jan 29, 2015 at 12:37

multigoodverse

8,15219 gold badges67 silver badges112 bronze badges

Comments

tweet · Accepted Answer · 2012-06-16 21:14:20Z

1

The sort method and sorted function allow you to provide a custom function to extract the key used for comparison:

>>> ls = ['Q1.3', 'Q6.1', 'Q1.2']
>>> sorted(ls, key=lambda x: float(x[1:]))
['Q1.2', 'Q1.3', 'Q6.1']

answered Jun 16, 2012 at 21:14

1152 bronze badges

2 Comments

pythOnometrist Over a year ago

This works for lists in general and I am familiar with it. How do I apply it to a pandas DataFrame?

tweet Over a year ago

Not sure, I admit my answer was not specific to this library.

Roko Mijic · Accepted Answer · 2017-07-24 10:04:15Z

One use-case is that you have named (some of) your columns with some prefix, and you want the columns sorted with those prefixes all together and in some particular order (not alphabetical).

For example, you might start all of your features with Ft_, labels with Lbl_, etc, and you want all unprefixed columns first, then all features, then the label. You can do this with the following function (I will note a possible efficiency problem using sum to reduce lists, but this isn't an issue unless you have a LOT of columns, which I do not):

def sortedcols(df, groups = ['Ft_', 'Lbl_'] ):
    return df[ sum([list(filter(re.compile(r).search, list(df.columns).copy())) for r in (lambda l: ['^(?!(%s))' % '|'.join(l)] + ['^%s' % i  for i in l ] )(groups)   ], [])  ]

Aravind Krishnakumar · Accepted Answer · 2015-06-20 19:58:40Z

-3

print df.sort_index(by='Frequency',ascending=False)

where by is the name of the column,if you want to sort the dataset based on column

answered Jun 20, 2015 at 19:58

Aravind Krishnakumar

2,7771 gold badge31 silver badges27 bronze badges

Collectives™ on Stack Overflow

Sorting columns in pandas dataframe based on column name

11 Answers 11

10 Comments

3 Comments

3 Comments

Comments

1 Comment

Comments

Comments

Comments

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

10 Comments

3 Comments

3 Comments

Comments

1 Comment

Comments

Comments

Comments

2 Comments

Comments

Comments

Linked

Related