3

I want to divide all columns in a dataframe with a multiindex by another dataframe with a multiindex, one level smaller. The first two levels of moth indices are identical. And the third level should be broadcasted.

df_0 = pd.DataFrame( {
    "col0": [ 1, 2, 3, 4, 5 ],
    "col1": [ 3, 6, 9, 12, 15 ],
} )
df_0.index = pd.MultiIndex.from_tuples(
    [ ( "A", "a", 0 ), ( "A", "a", 1 ), ( "A", "b", 0 ), ( "A", "b", 1 ), ( "B", "b", 0 ) ]
)
df_0.index.names = [ "foo", "bar", "baz" ]

df_1 = pd.DataFrame( {
    "stuff": [ 100, 110, 120, 130, ],
} )
df_1.index = pd.MultiIndex.from_tuples(
    [ ( "A", "a" ), ( "A", "b" ), ( "B", "a" ), ( "B", "b" ) ]
)
df_1.index.names = [ "foo", "bar" ]
print( df_0 )
print( df_1 )

This gives me the following two dataframes df_0

             col0  col1
foo bar baz            
A   a   0       1     3
        1       2     6
    b   0       3     9
        1       4    12
B   b   0       5    15

and df_1

         stuff
foo bar       
A   a      100
    b      110
B   a      120
    b      130

If I try to divide each column value by the respective stuff column I get an error message

print( df_0.div( df_1 ) )
Join on level between two MultiIndex objects is ambiguous

What I want to achieve is the following result:

              col0    col1
foo bar baz            
A   a   0    1/100   3/100
        1    2/100   6/100
    b   0    3/110   9/110
        1    4/110  12/110
B   b   0    5/130  15/130

2 Answers 2

2

Use DataFrame.reindex by first DataFrame, so same 3 levels in both, so possible divide by Series selected by stuff:

df = df_0.div(df_1.reindex(df_0.index)['stuff'], axis=0)
print (df)
                 col0      col1
foo bar baz                    
A   a   0    0.010000  0.030000
        1    0.020000  0.060000
    b   0    0.027273  0.081818
        1    0.036364  0.109091
B   b   0    0.038462  0.115385

Details:

print( df_1.reindex(df_0.index)['stuff'] )
foo  bar  baz
A    a    0      100
          1      100
     b    0      110
          1      110
B    b    0      130
Name: stuff, dtype: int64
Sign up to request clarification or add additional context in comments.

2 Comments

With Pandas 0.22.0 df_1.reindex(df_0.index)['stuff'] resulted in a dataframe filled only with NaNs. After updating to Pandas 1.0.3 it behaves as expected and shown in the answer.
@leviathan - Thank you for info. I guess it not working in 0.22.0 because bug.
2

You first need to join the dataframes so the indexes are aligned and then perform the operations:

df = df_0.join(df_1)
df['col0'] = df.col0/df.stuff
df['col1'] = df.col1/df.pop('stuff')

               col0      col1
foo bar baz                    
A   a   0    0.010000  0.030000
        1    0.020000  0.060000
    b   0    0.027273  0.081818
        1    0.036364  0.109091
B   b   0    0.038462  0.115385

You might be getting a NotImplementedError if your pandas version is outdated. In that case an alternative is to reset_index and merge:

df = df_0.reset_index().merge(df_1, on=['foo', 'bar']).set_index(['foo', 'bar','baz'])

3 Comments

If I try df_0.join(df_1) I get the error: "NotImplementedError: merging with more than one level overlap on a multi-index is not implemented" I use Pandas version 0.22.0.
What version of pandas do you have @leviathan ? It works for me on pandas 1.
I updated Pandas to Version 1.0.3 and now it works.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.