3

I have a pandas dataframe that looks like this:

     id             name  total  cubierto  no_cubierto  escuela_id  nivel_id 
0   1        direccion      1         1            0   420000707         1   
1   2  frente_a_alunos      4         4            0   420000707         1   
2   3            apoyo      2         2            0   420000707         1   
3   4        direccion      2         2            0   840477414         2   
4   5  frente_a_alunos      8         8            0   840477414         2   
5   6            apoyo      4         3            1   840477414         2   
6   7        direccion      7         7            0   918751515         3   
7   8            apoyo     37        37            0   918751515         3   
8   9        direccion      1         1            0   993683216         1   
9  10  frente_a_alunos      7         7            0   993683216         1   

The column "name" has 3 unique values:

 - direccion
 - frente a alunos
 - apoyo

and I need to get a new dataframe, grouped by "escuela_id" and "nivel_id" that has the columns:

 - direccion_total
 - direccion_cubierto
 - frente_a_alunos_total
 - frente_a_alunos_cubierto
 - apoyo_total
 - apoyo_cubierto
 - escuela_id
 - nivel_id

getting the values from columns "total" and "cubierto" respectively. I don't need the column "no_cubierto". Is it possible to do it with pandas functions? I am stucked on it and I couldn't find any solution.

The output for the example should look like this:

escuela_id      nivel_id   apoyo_cubierto   apoyo_total   direccion_total  
0   420000707         1              2           2                1   
1   840477414         2              3           4                2   
2   918751515         3             37          37                7   
3   993683216         1             ..          ..                1   


   direccion_cubierto    frente_a_alunos_total    frente_a_alunos_cubierto  
0                   1                     4                        4  
1                   2                     8                        8  
2                   7                    ..                       ..  
3                   1                     7                        7  
2
  • Pandas has a groupby() function for this. Commented Jun 8, 2020 at 21:58
  • Show us your code and we can help you out with some feedback and guidance Commented Jun 8, 2020 at 21:59

1 Answer 1

1

You need to use pivot_table here:

df = df.pivot_table(index=['escuela_id', 'nivel_id'], columns='name', values=['total', 'cubierto']).reset_index()
df.columns = ['_'.join(col).strip() for col in df.columns.values]
print(df)

Output:

   escuela_id_  nivel_id_  cubierto_apoyo  cubierto_direccion  cubierto_frente_a_alunos  total_apoyo  total_direccion  total_frente_a_alunos
0    420000707          1             2.0                 1.0                       4.0          2.0              1.0                    4.0
1    840477414          2             3.0                 2.0                       8.0          4.0              2.0                    8.0
2    918751515          3            37.0                 7.0                       NaN         37.0              7.0                    NaN
3    993683216          1             NaN                 1.0                       7.0          NaN              1.0                    7.0
Sign up to request clarification or add additional context in comments.

1 Comment

I just posted an example showing how the output should look like

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.