586

I have a dataframe that look like this:

          a         b         c         d
0  0.418762  0.042369  0.869203  0.972314
1  0.991058  0.510228  0.594784  0.534366
2  0.407472  0.259811  0.396664  0.894202
3  0.726168  0.139531  0.324932  0.906575

How I can get all columns except b?

3
  • @cs95 -- The currently listed duplicate target isn't a duplicate. Despite the original title, the linked question is "Why doesn't this specific syntax work", whereas this question is a more general "What is the best way to do this". -- Add to this the difference between deleting a column from an existing DataFrame versus creating a new DataFrame with all-but-one of the columns of another. Commented May 21, 2019 at 19:30
  • @R.M. I'm sorry but I don't agree with the edit you've made to the title on that post, so I've rolled it back. It's true that the intent of the OP was to question the syntax, but the post has grown to address the more broad question of how to delete a column. The answers in this post are carbon copies of the highest upvoted post there. The dupe stays. Commented May 21, 2019 at 19:46
  • Note this question is being discussed on Meta. Commented May 21, 2019 at 21:24

14 Answers 14

803

When the columns are not a MultiIndex, df.columns is just an array of column names so you can do:

df.loc[:, df.columns != 'b']

          a         c         d
0  0.561196  0.013768  0.772827
1  0.882641  0.615396  0.075381
2  0.368824  0.651378  0.397203
3  0.788730  0.568099  0.869127
Sign up to request clarification or add additional context in comments.

6 Comments

Not bad, but @mike's solution using drop is better IMO. A bit more readable and handles multiindexes
I actually agree that @mike's solution using drop is better - I do think it's useful to discover that (single-level) columns are arrays you can work with, but specifically for dropping a column, drop is very readable and works well with complex indexes.
Thank you for this greate answer. what if I don't have a header ? how do I adrress ?
What about when you have more than 1 column to be ignored?
@Marius Does this work with multiple columns (say two)?
|
460

Don't use ix. It's deprecated. The most readable and idiomatic way of doing this is df.drop():

>>> df.drop('b', axis=1)
          a         c         d
0  0.418762  0.869203  0.972314
1  0.991058  0.594784  0.534366
2  0.407472  0.396664  0.894202
3  0.726168  0.324932  0.906575

Note that by default, .drop() does not operate inplace; despite the ominous name, df is unharmed by this process. If you want to permanently remove b from df, do df.drop('b', inplace=True).

df.drop() also accepts a list of labels, e.g. df.drop(['a', 'b'], axis=1) will drop column a and b. You can use columns too, as in df.drop(columns='a') or df.drop(columns=['a', 'b']) (thanks @BallpointBen in the comments).

6 Comments

Also works on a multiindex just like you'd expect it to. df.drop([('l1name', 'l2name'), 'anotherl1name'], axis=1). Seems to use list vs tuple to determine if you want multiple columns (list) or referring to a multiindex (tuple).
More readable: df.drop(columns='a') or df.drop(columns=['a', 'b']). Can also replace columns= with index=.
However this is not useful if you happen not to know the names of all the columns you want to drop.
Since this creates a copy and not a view/reference, you cannot modify the original dataframe by using this on the LHS of an assignment.
@JanChristophTerasa Do you happen to know how to modify these selected columns within original df (such multiply all these columns with values of another column). If I modify these values I would need to tack on the dropped column on the end which doesn't seem to be best way.
|
233
df[df.columns.difference(['b'])]

Out: 
          a         c         d
0  0.427809  0.459807  0.333869
1  0.678031  0.668346  0.645951
2  0.996573  0.673730  0.314911
3  0.786942  0.719665  0.330833

6 Comments

I like this approach as it can be used to omit more than one column.
@NischalHp df.drop can also omit more than one column df.drop(['a', 'b'], axis=1)
I think it's worth noting that this can re-arrange your columns
@ocean800 Yes that's true. You can pass sort=False if you want to avoid that behaviour (df.columns.difference(['b'], sort=False))
This is the top one that works on a DataFrameGroupBy, which is what I was looking for, thanks! I used grouped[df.columns.difference(['b'])]...
|
150

You can use df.columns.isin()

df.loc[:, ~df.columns.isin(['b'])]

When you want to drop multiple columns, as simple as:

df.loc[:, ~df.columns.isin(['col1', 'col2'])]

1 Comment

This method was helpful to modify the selected columns!
41

You can drop columns in index:

df[df.columns.drop('b')]

or

df.loc[:, df.columns.drop('b')]

If you need to drop multiple columns, use a list of labels instead of a single label.

Comments

15

Here is another way:

df[[i for i in list(df.columns) if i != '<your column>']]

You just pass all columns to be shown except of the one you do not want.

Comments

9

Here is a one line lambda:

df[map(lambda x :x not in ['b'], list(df.columns))]

before:

import pandas
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df

       a           b           c           d
0   0.774951    0.079351    0.118437    0.735799
1   0.615547    0.203062    0.437672    0.912781
2   0.804140    0.708514    0.156943    0.104416
3   0.226051    0.641862    0.739839    0.434230

after:

df[map(lambda x :x not in ['b'], list(df.columns))]

        a          c          d
0   0.774951    0.118437    0.735799
1   0.615547    0.437672    0.912781
2   0.804140    0.156943    0.104416
3   0.226051    0.739839    0.434230

Comments

7

I think the best way to do is the way mentioned by @Salvador Dali. Not that the others are wrong.

Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes. Then dropping the column of the data set might not help. Of course there are use cases for that as well.

x_cols = [x for x in data.columns if x != 'name of column to be excluded']

Then you can put those collection of columns in variable x_cols into another variable like x_cols1 for other computation.

ex: x_cols1 = data[x_cols]

1 Comment

Can you explain why this is a separate answer instead of a comment / extension to Salvador's answer?
7

Another slight modification to @Salvador Dali enables a list of columns to exclude:

df[[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]

or

df.loc[:,[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]

Comments

4

Similar to @Toms answer, it is also possible to select all columns except "b" without using .loc, like so:

df[df.columns[~df.columns.isin(['b'])]]

1 Comment

why, why not, would you use .loc or simply square brackets?
4

I've tested speed and found that for me the .loc solution was the fastest

df_working_1.loc[:, df_working_1.columns != "market_id"] 
# 7.19 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df_working_1.drop("market_id", axis=1)
# 7.65 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df_working_1[df_working_1.columns.difference(['market_id'])]
# 7.58 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df_working_1[[i for i in list(df_working_1.columns) if i != 'market_id']]
# 7.57 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comments

1

I think a nice solution is with the function filter of pandas and regex (match everything except "b"):

df.filter(regex="^(?!b$)")

1 Comment

df.filter(regex='[^b]') shaves off a little more. But even then, this solution isn't very readable...
1

You can also pop() a column. It removes a column from a dataframe but returns it as a Series, which you assign to a value (y below). If you don't assign, it's just thrown away. One case where this is quite useful is to separate the target variable from the feature set in ML. For example:

X = pd.DataFrame({'feature1': range(5), 'feature2': range(6,11), 'target': [0,0,0,1,1]})
y = X.pop('target')

It makes the following transformation:

res

Comments

0

This allows you to drop multiple columns even if you aren't sure they exist, and works for MultiIndex columns too.

df.drop(columns=[x for x in ('abc', ('foo', 'bar')) if x in df.columns])

In this example (assuming a 2-level MultiIndex) it will drop all columns with abc in the first level, and it will also drop the single column ('foo', 'bar')

I've added this answer as this is the first question that appears even when searching for MultiIndex.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.