Create new column that is the sum of two rows, but repeat every two rows [duplicate]

Question

I am working on building an additional column in a data frame which is the sum of two rows for one Time Period. A picture is attached here:

I want to create a new column which is the Sum of Lives for 'IN' and 'SA' in the 'BillType' column for each TimePeriodId. That way I will have one 'Lives Total' entry for a single TimePeriodId. I have gone through a lot of documentation and cant figure out how I would do that in this case.

code sample:

sa = pd.read_sql(sa_q1, sql_conn)

#convert TimePeriodId to string values

sa['TimePeriodId'] = sa['TimePeriodId'].astype(str)

sa = sa.loc[(sa['BillType'] =='SA') | (sa['BillType']=='IN')]#.drop(['BillType'], axis = 1)

sa.head(10).to_dict()

#the last line returns the following:

{'TimePeriodId': {1: '201811',
  2: '201811',
  4: '201812',
  5: '201812',
  9: '201901',
  11: '201901',
  13: '201902',
  14: '201902',
  17: '201903',
  18: '201903'},
 'BillType': {1: 'IN',
  2: 'SA',
  4: 'IN',
  5: 'SA',
  9: 'SA',
  11: 'IN',
  13: 'IN',
  14: 'SA',
  17: 'IN',
  18: 'SA'},
 'Lives': {1: 1067,
  2: 288028,
  4: 1058,
  5: 287501,
  9: 293560,
  11: 1068,
  13: 1089,
  14: 278850,
  17: 1076,
  18: 276961}}

Any help would be appreciated!

Please include the input as text in the question. Also, please include te expected output. — Roy2012
– Roy2012, Commented Jul 21, 2020 at 12:57
I might try to post an answer, if you gave the data as copyable text instead of an image... — Serge Ballesta
– Serge Ballesta, Commented Jul 21, 2020 at 12:57
Sorry, im new to python and am unsure about what you mean by "input as text"? — KeithRoberts
– KeithRoberts, Commented Jul 21, 2020 at 12:58
@KeithRoberts Post df.to_dict() to the question, if df is large post df.head(10).to_dict() and expected output. This makes reproducing your data locally easy. — Ch3steR
– Ch3steR, Commented Jul 21, 2020 at 12:59

Jaroslav Bezděk · Accepted Answer · 2020-07-21 13:02:41Z

1

You can try to use pandas.DataFrame.groupby() method to compute sum of lives for every time period. After that you can enrich sa dataframe by the computed column using pandas.DataFrame.transform() method.

>>> sa['LivesTotal'] = sa.groupby('TimePeriodId').Lives.transform('sum')

answered Jul 21, 2020 at 13:02

Jaroslav Bezděk

7,7156 gold badges33 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

KeithRoberts Over a year ago

This works well but I end up with duplicates. I use.drop_duplicates() and get nan values. This is okay for me so long as I can build a graph from the df which doesn't take in the nan values

jezrael Over a year ago

@JaroslavBezděk - hmmm, if not dupe it is OK ask for accepting, for dupes the best remve answer...

Jaroslav Bezděk Over a year ago

Sorry, did not notice.

Collectives™ on Stack Overflow

Create new column that is the sum of two rows, but repeat every two rows [duplicate]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related