Adding a column to a Multiindex Dataframe

Question

I would like to add a column SUM to the df1 below. It's a Datetime MultiIndex and the new column SUM should return the sum of the price row.

multex = pd.MultiIndex.from_product([['price',
 'weight','quantity','portfolio'] ,df1.index],names=['Date', 'Stats'])

new_df = pd.DataFrame(index=multex, columns= df1.columns.values.tolist())

Subsequently would like to add a row SUM as well that returns the same value. I've tried the following so far:

df1['SUM']= df1.ix['price'].sum(axis=1) 


                           A      B      C        D     E

2006-04-28 00:00:00                                    
            price        69.62  69.62  6.518   65.09  69.62
            weight
            std
2006-05-01 00:00:00  
            price         71.5   71.5  6.522   65.16   71.5
            weight
            std
2006-05-02 00:00:00   
            price        72.34  72.34  6.669   66.55  72.34
            weight
            std

I would update your question so people can easily reproduce your dataframe. Also, what was the result of df1['SUM']= df1.ix['price'].sum(axis=1) — Bob Haffner
– Bob Haffner, Commented Sep 19, 2016 at 3:20
HI. thanks. thats the error that I get: ValueError: shape mismatch: value array of shape (2939,) could not be broadcast to indexing result of shape (2939,2) — uniXVanXcel
– uniXVanXcel, Commented Sep 19, 2016 at 3:46

jezrael · Accepted Answer · 2016-09-19 05:36:29Z

You can use first DataFrame.sort_index, because error:

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'

Then use slicers:

df1 = df1.sort_index()

idx = pd.IndexSlice
df1['SUM'] = df1.loc[idx[:,'price'],:].sum(axis=1)
print (df1)
                                A      B      C      D      E      SUM
Date                Stats                                             
2006-04-28 00:00:00 price   69.62  69.62  6.518  65.09  69.62  280.468
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-01 00:00:00 price   71.50  71.50  6.522  65.16  71.50  286.182
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-02 00:00:00 price   72.34  72.34  6.669  66.55  72.34  290.239
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN

df1['SUM'] = df1.loc[(slice(None), slice('price')),:].sum(axis=1)
print (df1)
                                A      B      C      D      E      SUM
Date                Stats                                             
2006-04-28 00:00:00 price   69.62  69.62  6.518  65.09  69.62  280.468
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-01 00:00:00 price   71.50  71.50  6.522  65.16  71.50  286.182
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-02 00:00:00 price   72.34  72.34  6.669  66.55  72.34  290.239
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN

Do you need df1.loc[('', 'SUM'),:] = df1.loc[(slice(None), slice('price')),:].sum() ?
And what about df1.loc[(None, 'SUM'),:] = df1.loc[(slice(None), slice('price')),:].sum() ?
df2 = df1.loc[(slice(None), slice('price')),:].sum().to_frame().T df2.index = pd.MultiIndex.from_arrays([None,'Sum']) print (pd.concat([df1,df2]))

Collectives™ on Stack Overflow

Adding a column to a Multiindex Dataframe

1 Answer 1

15 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

15 Comments

Related