Dataset is something like this (there will be duplicate rows in the original):
Code:
import pandas as pd
df_in = pd.DataFrame({'email_ID': {0: 'mail_1',
1: 'mail_1',
2: 'mail_1',
3: 'mail_1',
4: 'mail_1',
5: 'mail_1',
6: 'mail_2',
7: 'mail_2',
8: 'mail_2',
9: 'mail_2',
10: 'mail_2',
11: 'mail_2'},
'time_stamp': {0: '2021-09-10 09:01:56.340259',
1: '2021-09-10 09:01:56.672814',
2: '2021-09-10 09:01:57.471423',
3: '2021-09-10 09:01:57.480891',
4: '2021-09-10 09:01:57.484644',
5: '2021-09-10 09:01:57.984644',
6: '2021-09-10 09:01:56.340259',
7: '2021-09-10 09:01:56.672814',
8: '2021-09-10 09:01:57.471423',
9: '2021-09-10 09:01:57.480891',
10: '2021-09-10 09:01:57.484644',
11: '2021-09-10 09:01:57.984644'},
'screen': {0: 'a',
1: 'b',
2: 'c',
3: 'd',
4: 'c',
5: 'b',
6: 'a',
7: 'b',
8: 'c',
9: 'b',
10: 'c',
11: 'd'}})
df_in['time_stamp'] = df_in['time_stamp'].astype('datetime64[ns]')
df_in
Output should be this:
Code:
import pandas as pd
df_out = pd.DataFrame({'email_ID': {0: 'mail_1',
1: 'mail_1',
2: 'mail_1',
3: 'mail_1',
4: 'mail_1',
5: 'mail_1',
6: 'mail_2',
7: 'mail_2',
8: 'mail_2',
9: 'mail_2',
10: 'mail_2',
11: 'mail_2'},
'time_stamp': {0: '2021-09-10 09:01:56.340259',
1: '2021-09-10 09:01:56.672814',
2: '2021-09-10 09:01:57.471423',
3: '2021-09-10 09:01:57.480891',
4: '2021-09-10 09:01:57.484644',
5: '2021-09-10 09:01:57.984644',
6: '2021-09-10 09:01:56.340259',
7: '2021-09-10 09:01:56.672814',
8: '2021-09-10 09:01:57.471423',
9: '2021-09-10 09:01:57.480891',
10: '2021-09-10 09:01:57.484644',
11: '2021-09-10 09:01:57.984644'},
'screen': {0: 'a',
1: 'b',
2: 'c',
3: 'd',
4: 'c',
5: 'b',
6: 'a',
7: 'b',
8: 'c',
9: 'b',
10: 'c',
11: 'd'},
'series1': {0: 0,
1: 1,
2: 2,
3: 3,
4: 0,
5: 1,
6: 0,
7: 1,
8: 2,
9: 3,
10: 4,
11: 5},
'series2': {0: 0,
1: 0,
2: 0,
3: 0,
4: 1,
5: 1,
6: 2,
7: 2,
8: 2,
9: 2,
10: 2,
11: 2}})
df_out['time_stamp'] = df['time_stamp'].astype('datetime64[ns]')
df_out
'series1' column values starts row by row as 0, 1, 2, and so on but resets to 0 when:
- 'email_ID' column value changes.
- 'screen' column value == 'd'
'series2' column values starts with 0 and increments by 1 whenever 'series1' resets.
My progress:
series1 = [0]
x = 0
for index in df[1:].index:
if ((df._get_value(index - 1, 'email_ID')) == df._get_value(index, 'email_ID')) and (df._get_value(index - 1, 'screen') != 'd'):
x += 1
series1.append(x)
else:
x = 0
series1.append(x)
df['series1'] = series1
df
series2 = [0]
x = 0
for index in df[1:].index:
if df._get_value(index, 'series1') - df._get_value(index - 1, 'series1') == 1:
series2.append(x)
else:
x += 1
series2.append(x)
df['series2'] = series2
df
I think the code above is working, I'll test answered codes and select the best in a few hours, thank you.