I'd like a new column in my dataset that shows the preceding actions when the identifier is the same, combined with the action in the current row.
So far I've tried to loop through the df, but this only captures the preceding row and not all rows in each group.
Starting with the data like this:
requestTime identifier aggregation
38:00.5 123 abc
38:02.2 123 def
38:03.9 123 ghi
38:04.9 456 abc
This is the code I've tried so far:
trial["newAgg"] = trial["aggregation"].shift(1)
trial["newId"] = trial["identifier"].shift(1)
for index, row in trial.iterrows():
if row.identifier == row.newId:
trial["newAgg"] + " - " + trial["aggregation"]
else:
trial["newAgg"] = trial["aggregation"]
which outputs:
requestTime identifier aggregation newAgg newId
38:00.5 123 abc abc
38:02.2 123 def abc - def 123
38:03.9 123 ghi def - ghi 123
38:04.9 456 abc abc 456
But I'd like the output to be as follows:
requestTime identifier aggregation newAgg newId
38:00.5 123 abc abc
38:02.2 123 def abc - def 123
38:03.9 123 ghi abc - def - ghi 123
38:04.9 456 abc abc 456