4
    yearCount = df[['antibiotic', 'order_date', 'antiYearCount']]

    yearGroups = yearCount.groupby('order_date')

    for year in yearGroups:
        yearCount['antiYearCount'] =year.groupby('antibiotic'['antibiotic'].transform(pd.Series.value_counts)

In this case, yearCount is a dataframe containing 'order_date', 'antibiotic', 'antiYearCount'. I have cleaned 'order_date' to only contain the year of the order. I want to group yearCount by the years in 'order_date', count the number of times each 'antibiotic' appears in each "year group" then assign that value to yearCount's 'antiYearCount' variable.

1 Answer 1

5

I think you need add new column order_date to groupby and then is also possible usesize instead pd.Series.value_counts for same output:

df = pd.DataFrame({'antibiotic':list('accbbb'),
                   'antiYearCount':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'order_date': pd.to_datetime(['2012-01-01']*3+['2012-01-02']*3)})

print (df)
   C  D  E  antiYearCount antibiotic order_date
0  7  1  5              4          a 2012-01-01
1  8  3  3              5          c 2012-01-01
2  9  5  6              4          c 2012-01-01
3  4  7  9              5          b 2012-01-02
4  2  1  2              5          b 2012-01-02
5  3  0  4              4          b 2012-01-02

#copy for remove warning
#https://stackoverflow.com/a/45035966/2901002
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']].copy()
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform('size')
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3

yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform(pd.Series.value_counts)
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3
Sign up to request clarification or add additional context in comments.

3 Comments

jezrael has a great answer, I was able to fix it in the meantime. groupby returns a tuple, in this case year[0] is the year being iterated on and year[1] is the df where values of order_date are year[0]. By doing the same groupby function on year[1]
Hmmm, I think loop is not necessary, so your solution can be changed to my sol?
Right! It was running in under a second on 1.7M rows so I didnt consider. Thanks so much!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.