1

I want to access a column/index in a dataframe that is a concatenation between 2 dataframes, one that has a multiindex, and an other that doesn't. Questions are inside the code.

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.ones((2, 2)), columns=["a", "b"])

df2_cols = pd.MultiIndex.from_tuples([("c", 1), ("d", 2)])
df2 = pd.DataFrame(data=np.ones((2, 2)), columns=df2_cols)

df = pd.concat([df1, df2], axis=1)
print(df)

Output:

     a    b  (c, 1)  (d, 2)
0  1.0  1.0     1.0     1.0
1  1.0  1.0     1.0     1.0

Now accesing different parts of the new Dataframe:

df1.loc[:, "a"]  # works
df.loc[:, "a"]   # works
df2.loc[:, ("c", 1)] # works
df.loc[:, ("c", 1)] # crashes -> is it possible to access this column using loc?

# even this crashes, where I am directly using the name provided by the dataframe column:
df.loc[:, df.columns[2]]

Error:

KeyError: "None of [Index(['c', 1], dtype='object')] are in the [columns]"

df[("c", 1)] # interestingly works

df = df.T
df.loc[("c", 1)] # crashes -> is it possible to access this index using loc?

I know I can use iloc or the option here: join multiindex dataframe with single-index dataframe breaks multiindex format, which makes sure that the multiindex format stays in the new dataframe. But wondering if it is possible without that.

1 Answer 1

1

Seems like a bug to me.

You can cheat and use 2D slicing:

df.loc[:, [('c', 1)]]

Output:

   (c, 1)
0     1.0
1     1.0

You can assign correctly:

df.loc[:, [('c', 1)]] = [8,9]

Updated DataFrame:

     a    b  (c, 1)  (d, 2)
0  1.0  1.0       8     1.0
1  1.0  1.0       9     1.0

If you need a Series:

df.loc[:, [('c', 1)]].squeeze()

Output:

0    1.0
1    1.0
Name: (c, 1), dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.