Skip to main content
deleted 2 characters in body
Source Link
Reinderien
  • 71.1k
  • 5
  • 76
  • 256

Almost certainly an X/Y problem: you're trying hard to pad-interleave NaN values when you shouldn't be doing that at all, and theThe premise for your problem assumes a structure for your dataframe that you shouldn't use, at least for data processing. It seems you have no influence over the format of the Excel file to which you need to export.

Almost certainly an X/Y problem: you're trying hard to pad-interleave NaN values when you shouldn't be doing that at all, and the premise for your problem assumes a structure for your dataframe that you shouldn't use.

The premise for your problem assumes a structure for your dataframe that you shouldn't use, at least for data processing. It seems you have no influence over the format of the Excel file to which you need to export.

Source Link
Reinderien
  • 71.1k
  • 5
  • 76
  • 256

Almost certainly an X/Y problem: you're trying hard to pad-interleave NaN values when you shouldn't be doing that at all, and the premise for your problem assumes a structure for your dataframe that you shouldn't use.

Think about the "per"s in your frame description: per time, per KPI, per SKU, you start off with some unidentified floating-point value (maybe a price; who knows). Whenever you write "per", that should become a level in a multi-level index. This reduces your dataframe to one column, "Price".

Thinking about your "pad" operation, you're really just adding a second column (of even more unknown purpose):

import pandas as pd

# Our df for this example
dict_df = {
    "Time": [1,1,1,1,2,2,2,2,3,3,3,3],
    "KPI":  ["A","B","C","D","A","B","C","D","A","B","C","D"],
    "SKU1": [10,1,0.1,0.01,40,4,0.4,0.04,90,9,0.9,0.09],
    "SKU2": [20,2,0.2,0.02,50,5,0.5,0.05,100,10,1,0.1],
    "SKU3": [30,3,0.3,0.03,60,6,0.6,0.06,110,11,1.1,0.11],
    "SKU4": [70,7,0.7,0.07,80,8,0.8,0.08,120,12,1.2,0.12]
}
df_test = pd.DataFrame(dict_df)

# Reshape data to be in stacked, multi-index form
df_test.set_index(['Time', 'KPI'], inplace=True)
df_test.columns = df_test.columns.str.slice(3).astype(int)
df_test = df_test.stack()
df_test.index.set_names('SKU', level=-1, inplace=True)
df_test = pd.DataFrame({'Price': df_test})

filter_kpi = {"A", "B", "C"}
df_test = df_test[df_test.index.get_level_values('KPI').isin(filter_kpi)]

df_test['Mystery'] = pd.NA
print(df_test)
              Price Mystery
Time KPI SKU               
1    A   1     10.0    <NA>
         2     20.0    <NA>
         3     30.0    <NA>
         4     70.0    <NA>
     B   1      1.0    <NA>
         2      2.0    <NA>
         3      3.0    <NA>
         4      7.0    <NA>
     C   1      0.1    <NA>
         2      0.2    <NA>
         3      0.3    <NA>
         4      0.7    <NA>
2    A   1     40.0    <NA>
         2     50.0    <NA>
         3     60.0    <NA>
         4     80.0    <NA>
     B   1      4.0    <NA>
         2      5.0    <NA>
         3      6.0    <NA>
         4      8.0    <NA>
     C   1      0.4    <NA>
         2      0.5    <NA>
         3      0.6    <NA>
         4      0.8    <NA>
3    A   1     90.0    <NA>
         2    100.0    <NA>
         3    110.0    <NA>
         4    120.0    <NA>
     B   1      9.0    <NA>
         2     10.0    <NA>
         3     11.0    <NA>
         4     12.0    <NA>
     C   1      0.9    <NA>
         2      1.0    <NA>
         3      1.1    <NA>
         4      1.2    <NA>