4

Consider the following DataFrame:

arrays = [['foo', 'bar', 'bar', 'bar'],
      ['A', 'B', 'C', 'D']]
tuples = list(zip(*arrays))          
columnValues = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(np.random.rand(4,4), columns = columnValues)
print(df)
        foo       bar                    
          A         B         C         D
0  0.859664  0.671857  0.685368  0.939156
1  0.155301  0.495899  0.733943  0.585682
2  0.124663  0.467614  0.622972  0.567858
3  0.789442  0.048050  0.630039  0.722298

Say I want to remove the first column, like so:

df.drop(df.columns[[0]], axis = 1, inplace = True)
print(df)
        bar                    
          B         C         D
0  0.671857  0.685368  0.939156
1  0.495899  0.733943  0.585682
2  0.467614  0.622972  0.567858
3  0.048050  0.630039  0.722298

This produces the expected result, however the column labels foo and Aare retained:

print(df.columns.levels)
[['bar', 'foo'], ['A', 'B', 'C', 'D']]

Is there a way to completely drop a column, including its labels, from a MultiIndex DataFrame?

EDIT: As suggested by John, I had a look at https://github.com/pydata/pandas/issues/12822. What I got from it is that it's not a bug, however I believe the suggested solution (https://github.com/pydata/pandas/issues/2770#issuecomment-76500001) does not work for me. Am I missing something here?

df2 = df.drop(df.columns[[0]], axis = 1)
print(df2)
        bar                    
          B         C         D
0  0.969674  0.068575  0.688838
1  0.650791  0.122194  0.289639
2  0.373423  0.470032  0.749777
3  0.707488  0.734461  0.252820

print(df2.columns[[0]])

MultiIndex(levels=[['bar', 'foo'], ['A', 'B', 'C', 'D']],
       labels=[[0], [1]])

df2.set_index(pd.MultiIndex.from_tuples(df2.columns.values))

ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements
3
  • 1
    Check this github.com/pydata/pandas/issues/12822 Commented Apr 20, 2016 at 15:08
  • 1
    What if you just reassign the columns like df2.columns = pd.MultiIndex.from_tuples(df2.columns.values)? Or use df2.reindex(columns=pd.MultiIndex.from_tuples(df2.columns.values)) Commented Apr 20, 2016 at 15:37
  • That works! It feels very unnatural though... Could you perhaps post your reply as an answer so that I can accept it? Commented Apr 20, 2016 at 15:46

1 Answer 1

4

New Answer

As of pandas 0.20, pd.MultiIndex has a method pd.MultiIndex.remove_unused_levels

df.columns = df.columns.remove_unused_levels()

Old Answer

Our savior is pd.MultiIndex.to_series()

it returns a series of tuples restricted to what is in the DataFrame

df.columns = pd.MultiIndex.from_tuples(df.columns.to_series())
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.