How can I create a new data frame based on the existing columns?

Question

How can I create a new data frame based on the existing columns? It should calculate the average of the column 'a' for each same x. For example: a_new = sum the 'a' values and divide 3 where x=1. And also, for x=2, x=3,....

import pandas as pd
data = {'x': [ 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], 'a': [0.4, 0.88, 0.2, 0.1, 0.75, 0.98, 0.33, 0.22, 0.15, 0.14, 0.73, 0.25], 'year': [2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002]}   
df = pd.DataFrame(data)
df

    x    a      year
0   1   0.40    2000
1   2   0.88    2000
2   3   0.20    2000
3   4   0.10    2000
4   1   0.75    2001
5   2   0.98    2001
6   3   0.33    2001
7   4   0.22    2001
8   1   0.15    2002
9   2   0.14    2002
10  3   0.73    2002
11  4   0.25    2002

Expected Output:

    x   a_new
0   1   0.30
1   2   0.66
2   3   0.42
3   4   0.19

Look into pandas.groupby() that will do what you need

Emi OB
– Emi OB

2022-04-13 14:17:11 +00:00
Commented Apr 13, 2022 at 14:17 — Emi OB
– Emi OB, Commented Apr 13, 2022 at 14:17

optical_anathema · Accepted Answer · 2022-04-13 14:16:38Z

1

This might be what you're after.

df.groupby(['x']).mean()['a']

x
1    0.433333
2    0.666667
3    0.420000
4    0.190000
Name: a, dtype: float64

answered Apr 13, 2022 at 14:16

optical_anathema

1561 gold badge1 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How can I create a new data frame based on the existing columns?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related