0

So I am using DataFrame from Pandas, python.

The dataframe, I will be referring to was created by the following way:

search = DataFrame([[262,'ny', '20'],[515,'paris','19'],[669,'ldn','10'], [669,'ldn', 10],[669,'ldn',5]],columns = ['subscriber_id','location','radius' ])

title = DataFrame([[262,'director'],[515,'artist'],[669,'scientist']],columns = ['subscriber_id','title' ])

Both the title and search DataFrames are then merged.

mergedTable = merge(title, search, on='subscriber_id', how= 'outer')

This forms the dataframe:

   subscriber_id      title location radius
0            262   director       ny     20
1            515     artist    paris     19
2            669  scientist      ldn     10
3            669  scientist      ldn     10
4            669  scientist      ldn      5

As we can see it has been merged correctly, so we now have data for a subscriber in multiple rows dependent on their searches.

Now I do not want to get rid of the subscribers having multiple rows with different values, but I do want to get rid of duplicate rows.

This is the desired final result:

   subscriber_id      title location radius
0            262   director       ny     20
1            515     artist    paris     19
2            669  scientist      ldn     10
4            669  scientist      ldn      5

The row 3, a duplicate of row 2, is removed.

I have been researching this and it seems that drop_duplicates() should work, i.e.

mergedTable.drop_duplicates()

But this doesn't work, rows are not removed. Any tips/solutions available?

1
  • Can't get why it is downvoted; my vote count reached day limit, so can't upvote. The question, despite being consequence of some inattentiveness, seems good to me, having valid test case, sadly not the most common thing on SO. Commented Dec 2, 2013 at 18:53

1 Answer 1

3

Your radius is of dtype object due to some strings within: [669,'ldn','10']. And '10' != 10. Converting to integer will do the trick:

>>> mergedTable.radius = mergedTable.radius.astype(int)
>>> mergedTable.drop_duplicates()
   subscriber_id      title location  radius
0            262   director       ny      20
1            515     artist    paris      19
2            669  scientist      ldn      10
4            669  scientist      ldn       5
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.