0

I am working on building an additional column in a data frame which is the sum of two rows for one Time Period. A picture is attached here:

enter image description here

I want to create a new column which is the Sum of Lives for 'IN' and 'SA' in the 'BillType' column for each TimePeriodId. That way I will have one 'Lives Total' entry for a single TimePeriodId. I have gone through a lot of documentation and cant figure out how I would do that in this case.

code sample:

sa = pd.read_sql(sa_q1, sql_conn)

#convert TimePeriodId to string values

sa['TimePeriodId'] = sa['TimePeriodId'].astype(str)

sa = sa.loc[(sa['BillType'] =='SA') | (sa['BillType']=='IN')]#.drop(['BillType'], axis = 1)

sa.head(10).to_dict()

#the last line returns the following:

{'TimePeriodId': {1: '201811',
  2: '201811',
  4: '201812',
  5: '201812',
  9: '201901',
  11: '201901',
  13: '201902',
  14: '201902',
  17: '201903',
  18: '201903'},
 'BillType': {1: 'IN',
  2: 'SA',
  4: 'IN',
  5: 'SA',
  9: 'SA',
  11: 'IN',
  13: 'IN',
  14: 'SA',
  17: 'IN',
  18: 'SA'},
 'Lives': {1: 1067,
  2: 288028,
  4: 1058,
  5: 287501,
  9: 293560,
  11: 1068,
  13: 1089,
  14: 278850,
  17: 1076,
  18: 276961}}

Any help would be appreciated!

4
  • 3
    Please include the input as text in the question. Also, please include te expected output. Commented Jul 21, 2020 at 12:57
  • 1
    I might try to post an answer, if you gave the data as copyable text instead of an image... Commented Jul 21, 2020 at 12:57
  • Sorry, im new to python and am unsure about what you mean by "input as text"? Commented Jul 21, 2020 at 12:58
  • 1
    @KeithRoberts Post df.to_dict() to the question, if df is large post df.head(10).to_dict() and expected output. This makes reproducing your data locally easy. Commented Jul 21, 2020 at 12:59

1 Answer 1

1

You can try to use pandas.DataFrame.groupby() method to compute sum of lives for every time period. After that you can enrich sa dataframe by the computed column using pandas.DataFrame.transform() method.

>>> sa['LivesTotal'] = sa.groupby('TimePeriodId').Lives.transform('sum')
Sign up to request clarification or add additional context in comments.

3 Comments

This works well but I end up with duplicates. I use.drop_duplicates() and get nan values. This is okay for me so long as I can build a graph from the df which doesn't take in the nan values
@JaroslavBezděk - hmmm, if not dupe it is OK ask for accepting, for dupes the best remve answer...
Sorry, did not notice.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.