2

I have the following situation:

COD Level
UF  11
ME  1101
MI  11001
MU  1100452
MU  1100700
MI  11002
MU  1100080
MU  1100106
MU  1101492
ME  1102
MI  11003
MU  1100403
MU  1100023
UF  12
ME  1201
MI  12001
MU  1100122

.... (7000 rows)

----------- Interpretation

UF - 2 digits (higher level)
ME - 4 digits (level 2)
MI - 5 digits (level 1)
MU - 7 digits (level 0)

I am trying to re-organize this structure in order to separate each level by columns:

Expected output:

COD Level_0  Level_1  Level_2  Level_3  
MU  1100452    11001     1101    11
MU  1100700    11001     1101    11    
MU  1100080    11002     1101    11
MU  1100106    11002     1101    11
MU  1101492    11002     1101    11
MU  1100403    11003     1102    11
MU  1100023    11003     1102    11
MU  1100122    12001     1201    12

So, each level will be the same until another row, with the same level, appears.

1 Answer 1

3

Something like this?

In [48]: pd.pivot(df, columns='COD', values='Level').fillna(method='ffill').drop_duplicates('MU').dropna().astype(int).rename(columns={'UF': 'level_3', 'ME': 'level_2', 'MI': 'level_1', 'MU': 'level_0'})
    ...:                                                                                                                                                                                                   
Out[48]: 
COD  level_2  level_1  level_0  level_3
3       1101    11001  1100452       11
4       1101    11001  1100700       11
6       1101    11002  1100080       11
7       1101    11002  1100106       11
8       1101    11002  1101492       11
11      1102    11003  1100403       11
12      1102    11003  1100023       11
16      1201    12001  1100122       12

pd.pivot creates a seperate column for each value in 'COD'. All other columns are just NA at this point. Then you can use fillna with forward fill to propagate the values of upper levels down and finally you need to remove the first few rows that still contain NA as level_0 has not been written. The rest is to conform to your expected output.

EDIT: Use drop_duplicates to avoid carrying over values from before when level_3 changes

Sign up to request clarification or add additional context in comments.

3 Comments

Hi @maow, thanks for your answer. The 1100023 for example should have level_3 equals to 12. One solution should be remove duplicated level_0, keeping only the last row.
You're right, I updated the answer. I think you should keep the first row though.
You are correct, keep=First will generate the right output.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.