0

I have a df where there are duplicate rows in aggregate but in this form:

timestamp   animal_1  animal_2  
2020-06-28  14:28:57  dog fox    
2020-06-28  14:28:57  fox dog   
2020-06-29  18:28:57  dog fox   
2020-06-29  18:28:57  fox dog   
2020-06-30  17:35:57  dog fox   
2020-06-30  17:35:57  fox dog  

I only want to keep the rows that have a unique timestamp followed by a single combination of both animals. From the above df I would only want to return the following:

timestamp   animal_1  animal_2   
2020-06-28  14:28:57  dog fox    
2020-06-29  18:28:57  fox dog  
2020-06-30  17:35:57  dog fox  

What matters is that I return the number of times these 2 animals have interacted.

I have tried multiple sorting, grouping options using pandas but have had no luck.

1

1 Answer 1

1

First we need sort the column animals , the drop_duplicates

df[['animal_1', 'animal_2']]=np.sort(df[['animal_1', 'animal_2']].values, axis=1)
df=df.drop_duplicates()
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the code, I tested it and it works perfectly. Appreciate the help.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.