2

Let's say we have dataframe like this

df = pd.DataFrame({ 
                "metric": ["1","2","1" ,"1","2"],
                "group1":["o", "x", "x" , "o", "x"],
                "group2":['a', 'b', 'a', 'a', 'b'] ,
                "value": range(5),
                "value2": np.array(range(5))* 2})

df

    metric  group1  group2  value   value2
0   1         o      a         0    0
1   2         x      b         1    2
2   1         x      a         2    4
3   1         o      a         3    6
4   2         x      b         4    8

then I want to have pivot format

df['g'] = df.groupby(['group1','group2'])['group2'].cumcount()
df1 = df.pivot(index=['g','metric'], columns=['group1','group2'], values=['value','value2']).sort_index(axis=1).rename_axis(columns={'g':None})


            value       value2
   group1   o   x       o   x
   group2   a   a   b   a   a   b
g  metric                       
0   1       0.0 2.0 NaN 0.0 4.0 NaN
    2       NaN NaN 1.0 NaN NaN 2.0
1   1       3.0 NaN NaN 6.0 NaN NaN
    2       NaN NaN 4.0 NaN NaN 8.0

From here we can see that ("value","o","b") and ("value2","o","b") not exist after making pivot

but I need to have those columns with values NA So I tried;

cols = [('value','x','a'), ('value','o','a'),('value','o','b')]

df1.assign(**{col : "NA" for col in np.setdiff1d(cols, df1.columns.values)})

which gives

enter image description here

Expected output

            value           value2
   group1   o       x       o       x
   group2   a   b   a   b   a   b   a   b
g  metric                       
0   1       0.0 NaN 2.0 NaN 0.0 NaN 4.0 NaN
    2       NaN NaN NaN 1.0 NaN NaN NaN 2.0
1   1       3.0 NaN NaN NaN 6.0 NaN NaN NaN
    2       NaN NaN NaN 4.0 NaN NaN NaN 8.0

one corner case with this is that if b does not exist how to create that column ?

           value     value2
   group1   o   x    o  x
   group2   a   a    a  a   
g  metric                       
0   1       0.0 2.0  0.0 4.0    
    2       NaN NaN  NaN NaN    
1   1       3.0 NaN  6.0 NaN    
    2       NaN NaN  NaN NaN    

Multiple insert columns if not exist pandas

Pandas: Check if column exists in df from a list of columns

Pandas - How to check if multi index column exists

1 Answer 1

2

Use DataFrame.stack with DataFrame.unstack:

df1 = df1.stack([1,2],dropna=False).unstack([2,3])
print (df1)
         value               value2              
group1       o        x           o        x     
group2       a   b    a    b      a   b    a    b
g metric                                         
0 1        0.0 NaN  2.0  NaN    0.0 NaN  4.0  NaN
  2        NaN NaN  NaN  1.0    NaN NaN  NaN  2.0
1 1        3.0 NaN  NaN  NaN    6.0 NaN  NaN  NaN
  2        NaN NaN  NaN  4.0    NaN NaN  NaN  8.0

Or with selecting last and last previous levels:

df1 = df1.stack([-2,-1],dropna=False).unstack([-2,-1])

Another idea:

df1 = df1.reindex(pd.MultiIndex.from_product(df1.columns.levels), axis=1)
print (df1)
         value               value2              
group1       o        x           o        x     
group2       a   b    a    b      a   b    a    b
g metric                                         
0 1        0.0 NaN  2.0  NaN    0.0 NaN  4.0  NaN
  2        NaN NaN  NaN  1.0    NaN NaN  NaN  2.0
1 1        3.0 NaN  NaN  NaN    6.0 NaN  NaN  NaN
  2        NaN NaN  NaN  4.0    NaN NaN  NaN  8.0

EDIT:

If need set new columns by list of tuples:

cols = [('value','x','a'), ('value','o','a'),('value','o','b')]

df = df1.reindex(pd.MultiIndex.from_tuples(cols).union(df1.columns), axis=1)
print (df)
         value               value2          
             o        x           o    x     
             a   b    a    b      a    a    b
g metric                                     
0 1        0.0 NaN  2.0  NaN    0.0  4.0  NaN
  2        NaN NaN  NaN  1.0    NaN  NaN  2.0
1 1        3.0 NaN  NaN  NaN    6.0  NaN  NaN
  2        NaN NaN  NaN  4.0    NaN  NaN  8.0
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for quick post. I want to generalize the solution like using for loop. Assume we have 10-20 columns to check if the column exist or not in the pivot table. and if the column combination is not exist create that.
one other corner case I having with the issue is that ifb not exist at all in group2 how can we create that sub column filled with NAs.
@Alexander - So need last EDIT? Input is list of tuples for columns names?
yes, please that's actually one of the issue I'm also facing with this type of dataset. The column not exist in the long format dataframe but I need to show it in the pivot format.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.