How to select all columns except one in pandas?

Question

I have a dataframe that look like this:

          a         b         c         d
0  0.418762  0.042369  0.869203  0.972314
1  0.991058  0.510228  0.594784  0.534366
2  0.407472  0.259811  0.396664  0.894202
3  0.726168  0.139531  0.324932  0.906575

How I can get all columns except b?

@cs95 -- The currently listed duplicate target isn't a duplicate. Despite the original title, the linked question is "Why doesn't this specific syntax work", whereas this question is a more general "What is the best way to do this". -- Add to this the difference between deleting a column from an existing DataFrame versus creating a new DataFrame with all-but-one of the columns of another. — R.M.
– R.M., Commented May 21, 2019 at 19:30
@R.M. I'm sorry but I don't agree with the edit you've made to the title on that post, so I've rolled it back. It's true that the intent of the OP was to question the syntax, but the post has grown to address the more broad question of how to delete a column. The answers in this post are carbon copies of the highest upvoted post there. The dupe stays. — cs95
– cs95, Commented May 21, 2019 at 19:46

Will · Accepted Answer · 2019-11-20 19:15:49Z

803

When the columns are not a MultiIndex, df.columns is just an array of column names so you can do:

df.loc[:, df.columns != 'b']

          a         c         d
0  0.561196  0.013768  0.772827
1  0.882641  0.615396  0.075381
2  0.368824  0.651378  0.397203
3  0.788730  0.568099  0.869127

edited Nov 20, 2019 at 19:15

Will

4,6816 gold badges42 silver badges48 bronze badges

answered Apr 21, 2015 at 5:27

Marius

60.5k16 gold badges115 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

travc Over a year ago

Not bad, but @mike's solution using drop is better IMO. A bit more readable and handles multiindexes

Marius Over a year ago

I actually agree that @mike's solution using drop is better - I do think it's useful to discover that (single-level) columns are arrays you can work with, but specifically for dropping a column, drop is very readable and works well with complex indexes.

FabioSpaghetti Over a year ago

Thank you for this greate answer. what if I don't have a header ? how do I adrress ?

Bruno Ambrozio Over a year ago

What about when you have more than 1 column to be ignored?

MasayoMusic Over a year ago

@Marius Does this work with multiple columns (say two)?

|

qwr · Accepted Answer · 2024-08-28 14:31:39Z

460

Don't use ix. It's deprecated. The most readable and idiomatic way of doing this is df.drop():

>>> df.drop('b', axis=1)
          a         c         d
0  0.418762  0.869203  0.972314
1  0.991058  0.594784  0.534366
2  0.407472  0.396664  0.894202
3  0.726168  0.324932  0.906575

Note that by default, .drop() does not operate inplace; despite the ominous name, df is unharmed by this process. If you want to permanently remove b from df, do df.drop('b', inplace=True).

df.drop() also accepts a list of labels, e.g. df.drop(['a', 'b'], axis=1) will drop column a and b. You can use columns too, as in df.drop(columns='a') or df.drop(columns=['a', 'b']) (thanks @BallpointBen in the comments).

edited Aug 28, 2024 at 14:31

qwr

11.5k6 gold badges75 silver badges121 bronze badges

answered Jun 9, 2016 at 5:38

mike

5,6112 gold badges23 silver badges20 bronze badges

6 Comments

travc Over a year ago

Also works on a multiindex just like you'd expect it to. df.drop([('l1name', 'l2name'), 'anotherl1name'], axis=1). Seems to use list vs tuple to determine if you want multiple columns (list) or referring to a multiindex (tuple).

BallpointBen Over a year ago

More readable: df.drop(columns='a') or df.drop(columns=['a', 'b']). Can also replace columns= with index=.

yeliabsalohcin Over a year ago

However this is not useful if you happen not to know the names of all the columns you want to drop.

Jan Christoph Terasa Over a year ago

Since this creates a copy and not a view/reference, you cannot modify the original dataframe by using this on the LHS of an assignment.

MasayoMusic Over a year ago

@JanChristophTerasa Do you happen to know how to modify these selected columns within original df (such multiply all these columns with values of another column). If I modify these values I would need to tack on the dropped column on the end which doesn't seem to be best way.

|

user2285236user2285236 · Accepted Answer · 2016-08-28 14:05:30Z

233

df[df.columns.difference(['b'])]

Out: 
          a         c         d
0  0.427809  0.459807  0.333869
1  0.678031  0.668346  0.645951
2  0.996573  0.673730  0.314911
3  0.786942  0.719665  0.330833

answered Aug 28, 2016 at 14:05

user2285236

6 Comments

Nischal Hp Over a year ago

I like this approach as it can be used to omit more than one column.

JACKY88 Over a year ago

@NischalHp df.drop can also omit more than one column df.drop(['a', 'b'], axis=1)

ocean800 Over a year ago

I think it's worth noting that this can re-arrange your columns

user2285236 Over a year ago

@ocean800 Yes that's true. You can pass sort=False if you want to avoid that behaviour (df.columns.difference(['b'], sort=False))

wjandrea Over a year ago

This is the top one that works on a DataFrameGroupBy, which is what I was looking for, thanks! I used grouped[df.columns.difference(['b'])]...

|

William · Accepted Answer · 2018-10-31 01:47:21Z

150

You can use df.columns.isin()

df.loc[:, ~df.columns.isin(['b'])]

When you want to drop multiple columns, as simple as:

df.loc[:, ~df.columns.isin(['col1', 'col2'])]

answered Oct 31, 2018 at 1:47

William

4,4282 gold badges25 silver badges21 bronze badges

1 Comment

Derek O Over a year ago

This method was helpful to modify the selected columns!

Mykola Zotko · Accepted Answer · 2023-09-18 19:06:43Z

41

You can drop columns in index:

df[df.columns.drop('b')]

or

df.loc[:, df.columns.drop('b')]

If you need to drop multiple columns, use a list of labels instead of a single label.

edited Sep 18, 2023 at 19:06

answered Jan 17, 2021 at 13:21

Mykola Zotko

18.1k6 gold badges87 silver badges89 bronze badges

Comments

Salvador Dali · Accepted Answer · 2016-08-18 03:22:42Z

15

Here is another way:

df[[i for i in list(df.columns) if i != '<your column>']]

You just pass all columns to be shown except of the one you do not want.

answered Aug 18, 2016 at 3:22

Salvador Dali

224k151 gold badges724 silver badges766 bronze badges

Comments

Grant Shannon · Accepted Answer · 2019-09-02 14:57:14Z

Here is a one line lambda:

df[map(lambda x :x not in ['b'], list(df.columns))]

before:

import pandas
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df

       a           b           c           d
0   0.774951    0.079351    0.118437    0.735799
1   0.615547    0.203062    0.437672    0.912781
2   0.804140    0.708514    0.156943    0.104416
3   0.226051    0.641862    0.739839    0.434230

after:

df[map(lambda x :x not in ['b'], list(df.columns))]

        a          c          d
0   0.774951    0.118437    0.735799
1   0.615547    0.437672    0.912781
2   0.804140    0.156943    0.104416
3   0.226051    0.739839    0.434230

MRizwan33 · Accepted Answer · 2018-04-21 14:39:41Z

7

I think the best way to do is the way mentioned by @Salvador Dali. Not that the others are wrong.

Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes. Then dropping the column of the data set might not help. Of course there are use cases for that as well.

x_cols = [x for x in data.columns if x != 'name of column to be excluded']

Then you can put those collection of columns in variable x_cols into another variable like x_cols1 for other computation.

ex: x_cols1 = data[x_cols]

edited Apr 21, 2018 at 14:39

MRizwan33

2,7316 gold badges33 silver badges44 bronze badges

answered Apr 21, 2018 at 13:19

Sudhi

4211 gold badge8 silver badges19 bronze badges

1 Comment

user6839822 Over a year ago

Can you explain why this is a separate answer instead of a comment / extension to Salvador's answer?

user1718097 · Accepted Answer · 2018-09-23 14:29:18Z

7

Another slight modification to @Salvador Dali enables a list of columns to exclude:

df[[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]

or

df.loc[:,[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]

answered Sep 23, 2018 at 14:29

user1718097

4,33212 gold badges53 silver badges66 bronze badges

Comments

DataBach · Accepted Answer · 2021-11-08 07:47:46Z

4

Similar to @Toms answer, it is also possible to select all columns except "b" without using .loc, like so:

df[df.columns[~df.columns.isin(['b'])]]

answered Nov 8, 2021 at 7:47

DataBach

1,6853 gold badges22 silver badges47 bronze badges

1 Comment

Johan Over a year ago

why, why not, would you use .loc or simply square brackets?

dimay · Accepted Answer · 2022-10-08 19:34:59Z

I've tested speed and found that for me the .loc solution was the fastest

df_working_1.loc[:, df_working_1.columns != "market_id"] 
# 7.19 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df_working_1.drop("market_id", axis=1)
# 7.65 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df_working_1[df_working_1.columns.difference(['market_id'])]
# 7.58 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df_working_1[[i for i in list(df_working_1.columns) if i != 'market_id']]
# 7.57 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Billy Bonaros · Accepted Answer · 2020-09-10 13:23:17Z

1

I think a nice solution is with the function filter of pandas and regex (match everything except "b"):

df.filter(regex="^(?!b$)")

answered Sep 10, 2020 at 13:23

Billy Bonaros

1,73114 silver badges19 bronze badges

1 Comment

russhoppa Over a year ago

df.filter(regex='[^b]') shaves off a little more. But even then, this solution isn't very readable...

cottontail · Accepted Answer · 2023-04-12 04:11:37Z

1

You can also pop() a column. It removes a column from a dataframe but returns it as a Series, which you assign to a value (y below). If you don't assign, it's just thrown away. One case where this is quite useful is to separate the target variable from the feature set in ML. For example:

X = pd.DataFrame({'feature1': range(5), 'feature2': range(6,11), 'target': [0,0,0,1,1]})
y = X.pop('target')

It makes the following transformation:

edited Apr 12, 2023 at 4:11

answered Apr 12, 2023 at 3:55

cottontail

25.3k25 gold badges181 silver badges174 bronze badges

Comments

fantabolous · Accepted Answer · 2023-05-12 06:23:28Z

0

This allows you to drop multiple columns even if you aren't sure they exist, and works for MultiIndex columns too.

df.drop(columns=[x for x in ('abc', ('foo', 'bar')) if x in df.columns])

In this example (assuming a 2-level MultiIndex) it will drop all columns with abc in the first level, and it will also drop the single column ('foo', 'bar')

I've added this answer as this is the first question that appears even when searching for MultiIndex.

answered May 12, 2023 at 6:23

fantabolous

22.9k8 gold badges58 silver badges52 bronze badges

Collectives™ on Stack Overflow

How to select all columns except one in pandas?

14 Answers 14

6 Comments

6 Comments

6 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

6 Comments

6 Comments

6 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Linked

Related