Pandas dataframe - duplicates in data but dups don't reside in same columns

Question

I have a df where there are duplicate rows in aggregate but in this form:

timestamp   animal_1  animal_2  
2020-06-28  14:28:57  dog fox    
2020-06-28  14:28:57  fox dog   
2020-06-29  18:28:57  dog fox   
2020-06-29  18:28:57  fox dog   
2020-06-30  17:35:57  dog fox   
2020-06-30  17:35:57  fox dog

I only want to keep the rows that have a unique timestamp followed by a single combination of both animals. From the above df I would only want to return the following:

timestamp   animal_1  animal_2   
2020-06-28  14:28:57  dog fox    
2020-06-29  18:28:57  fox dog  
2020-06-30  17:35:57  dog fox

What matters is that I return the number of times these 2 animals have interacted.

I have tried multiple sorting, grouping options using pandas but have had no luck.

Does this answer your question? Drop all duplicate rows in Python Pandas — sushanth
– sushanth, Commented Jul 12, 2020 at 15:57

BENY · Accepted Answer · 2020-07-12 15:57:04Z

1

First we need sort the column animals , the drop_duplicates

df[['animal_1', 'animal_2']]=np.sort(df[['animal_1', 'animal_2']].values, axis=1)
df=df.drop_duplicates()

answered Jul 12, 2020 at 15:57

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mike Over a year ago

thanks for the code, I tested it and it works perfectly. Appreciate the help.

Collectives™ on Stack Overflow

Pandas dataframe - duplicates in data but dups don't reside in same columns

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related