1

I have a df that looks like this:

event    response.   duration
0           5          1.1
0           4          0.5
1           5          3.2
0           6          1.2
0           5          2.1
0           5          3.2
1           5          0.9
0           4          1.1
0           4          1.2
0           4          3.1
0           5          0.4
0           5          0.9 

If df.event indicates 1, then the event of interest has occurred. If the event occurred, then I want to see what the response for the next 2 rows. In the next 2 rows, I want to select the response with the greatest duration. I want this information created in a new column, responseType which holds NaN if the event is 0 and the max duration response if event is 1 over the next 2 rows.

It should look like this:

event    response.   duration.  responseType
0           5          1.1         NaN
0           4          0.5         NaN
1           5          3.2         NaN
0           6          2.2         6
0           5          1.1         NaN
0           5          3.2         NaN
1           5          0.9         NaN
0           4          1.1         NaN
0           4          1.2         4
0           4          3.1         NaN
0           5          0.4         NaN
0           5          0.9         NaN

1 Answer 1

2

you can use some boolean conditions then assign in a number of ways, here I'll use .idxmax with a .groupby and assign using .loc

con1 = df['event'].eq(1).cumsum()
con2 = df.groupby(df['event'].eq(1).cumsum()).cumcount()
s = df.assign(ky=con1).loc[(con1 > 0) & (con2 <= 2)]
   

df.loc[s[s['event'].ne(1)
                   ].groupby('ky')['duration'].idxmax(),'responseType'] = df['response.']


print(df)

    event  response.  duration  responseType
0       0          5       1.1           NaN
1       0          4       0.5           NaN
2       1          5       3.2           NaN
3       0          6       1.2           NaN
4       0          5       2.1           5.0
5       0          5       3.2           NaN
6       1          5       0.9           NaN
7       0          4       1.1           NaN
8       0          4       1.2           4.0
9       0          4       3.1           NaN
10      0          5       0.4           NaN
11      0          5       0.9           NaN

print(s)

   event  response.  duration  ky
2      1          5       3.2   1
3      0          6       1.2   1
4      0          5       2.1   1
6      1          5       0.9   2
7      0          4       1.1   2
8      0          4       1.2   2
Sign up to request clarification or add additional context in comments.

3 Comments

what does 'ky' do?
I am also confused as to how the duration column is factored into this response. Thanks!
@connor449 sorry - missed the greatest duration part see edit, tested it in a few secnarios and it works well, also i think your intended output is wrong - see my answer from your input df.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.