Remove column from multi index dataframe

Question

Consider the following DataFrame:

arrays = [['foo', 'bar', 'bar', 'bar'],
      ['A', 'B', 'C', 'D']]
tuples = list(zip(*arrays))          
columnValues = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(np.random.rand(4,4), columns = columnValues)
print(df)
        foo       bar                    
          A         B         C         D
0  0.859664  0.671857  0.685368  0.939156
1  0.155301  0.495899  0.733943  0.585682
2  0.124663  0.467614  0.622972  0.567858
3  0.789442  0.048050  0.630039  0.722298

Say I want to remove the first column, like so:

df.drop(df.columns[[0]], axis = 1, inplace = True)
print(df)
        bar                    
          B         C         D
0  0.671857  0.685368  0.939156
1  0.495899  0.733943  0.585682
2  0.467614  0.622972  0.567858
3  0.048050  0.630039  0.722298

This produces the expected result, however the column labels foo and Aare retained:

print(df.columns.levels)
[['bar', 'foo'], ['A', 'B', 'C', 'D']]

Is there a way to completely drop a column, including its labels, from a MultiIndex DataFrame?

EDIT: As suggested by John, I had a look at https://github.com/pydata/pandas/issues/12822. What I got from it is that it's not a bug, however I believe the suggested solution (https://github.com/pydata/pandas/issues/2770#issuecomment-76500001) does not work for me. Am I missing something here?

df2 = df.drop(df.columns[[0]], axis = 1)
print(df2)
        bar                    
          B         C         D
0  0.969674  0.068575  0.688838
1  0.650791  0.122194  0.289639
2  0.373423  0.470032  0.749777
3  0.707488  0.734461  0.252820

print(df2.columns[[0]])

MultiIndex(levels=[['bar', 'foo'], ['A', 'B', 'C', 'D']],
       labels=[[0], [1]])

df2.set_index(pd.MultiIndex.from_tuples(df2.columns.values))

ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements

What if you just reassign the columns like df2.columns = pd.MultiIndex.from_tuples(df2.columns.values)? Or use df2.reindex(columns=pd.MultiIndex.from_tuples(df2.columns.values)) — fernandezcuesta
– fernandezcuesta, Commented Apr 20, 2016 at 15:37
That works! It feels very unnatural though... Could you perhaps post your reply as an answer so that I can accept it? — BdB
– BdB, Commented Apr 20, 2016 at 15:46

piRSquared · Accepted Answer · 2017-11-07 04:36:14Z

4

New Answer

As of pandas 0.20, pd.MultiIndex has a method pd.MultiIndex.remove_unused_levels

df.columns = df.columns.remove_unused_levels()

Old Answer

Our savior is pd.MultiIndex.to_series()

it returns a series of tuples restricted to what is in the DataFrame

df.columns = pd.MultiIndex.from_tuples(df.columns.to_series())

edited Nov 7, 2017 at 4:36

answered Apr 21, 2016 at 22:49

piRSquared

295k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Remove column from multi index dataframe

1 Answer 1

New Answer

Old Answer

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

New Answer

Old Answer

Comments

Related