7

Suppose I have a DataFrame like this:

>>> df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]], columns=['a','b','b'])
>>> df
   a  b  b
0  1  2  3
1  4  5  6
2  7  8  9

And I want to remove second 'b' column. If I just use del statement, it'll delete both 'b' columns:

>>> del df['b']
>>> df
   a
0  1
1  4
2  7

I can select column by index with .iloc[] and reassign DataFrame, but how can I delete only second 'b' column, for example by index?

3
  • That's interesting. Reassigning sounds the appropriate move. Thinking twice, you know you want to delete 2nd b not based of the column names as you have duplicates but indeed on an index. Thus your algorithm somehow uses that index. So why just not change the columns to an index based in that case? Commented Nov 14, 2013 at 9:51
  • 1
    @Boud good suggestion, actually I could rename all columns which I want to delete and then delete by name, will try when will get to home Commented Nov 14, 2013 at 10:02
  • afaik, del df['b'] translates to block manager command to remove relative items from all blocks, i.e. roughly equals to reassignment df = df.iloc[:,:2] Commented Nov 14, 2013 at 10:23

1 Answer 1

6
df = df.drop(['b'], axis=1).join(df['b'].ix[:, 0:1])

>>> df
   a  b
0  1  2
1  4  5
2  7  8

Or just for this case

df = df.ix[:, 0:2]

But I think it has other better ways.

Sign up to request clarification or add additional context in comments.

1 Comment

This is the best way of retaining the first instance of a duplicate column that I have yet found!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.