Consider the following DataFrame:
arrays = [['foo', 'bar', 'bar', 'bar'],
['A', 'B', 'C', 'D']]
tuples = list(zip(*arrays))
columnValues = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(np.random.rand(4,4), columns = columnValues)
print(df)
foo bar
A B C D
0 0.859664 0.671857 0.685368 0.939156
1 0.155301 0.495899 0.733943 0.585682
2 0.124663 0.467614 0.622972 0.567858
3 0.789442 0.048050 0.630039 0.722298
Say I want to remove the first column, like so:
df.drop(df.columns[[0]], axis = 1, inplace = True)
print(df)
bar
B C D
0 0.671857 0.685368 0.939156
1 0.495899 0.733943 0.585682
2 0.467614 0.622972 0.567858
3 0.048050 0.630039 0.722298
This produces the expected result, however the column labels foo and Aare retained:
print(df.columns.levels)
[['bar', 'foo'], ['A', 'B', 'C', 'D']]
Is there a way to completely drop a column, including its labels, from a MultiIndex DataFrame?
EDIT: As suggested by John, I had a look at https://github.com/pydata/pandas/issues/12822. What I got from it is that it's not a bug, however I believe the suggested solution (https://github.com/pydata/pandas/issues/2770#issuecomment-76500001) does not work for me. Am I missing something here?
df2 = df.drop(df.columns[[0]], axis = 1)
print(df2)
bar
B C D
0 0.969674 0.068575 0.688838
1 0.650791 0.122194 0.289639
2 0.373423 0.470032 0.749777
3 0.707488 0.734461 0.252820
print(df2.columns[[0]])
MultiIndex(levels=[['bar', 'foo'], ['A', 'B', 'C', 'D']],
labels=[[0], [1]])
df2.set_index(pd.MultiIndex.from_tuples(df2.columns.values))
ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements
df2.columns = pd.MultiIndex.from_tuples(df2.columns.values)? Or usedf2.reindex(columns=pd.MultiIndex.from_tuples(df2.columns.values))