0

I have a dataframe:

df = pd.Dataframe({'src':['A','B','C'],'trg':['A','C','B'],'wgt':[1,3,7]})

I want to drop the duplicates from this dataframe for columns src and trg

df = df.drop_duplicates(subset=['src','trg'],keep='first',inplace=False)

This should drop the first row where src=A and trg='A'

But this is not happening. There is no change in the dataframe. What am I doing wrong ?

5
  • 1
    df[df['src'] != df['trg']]? Commented Mar 1, 2020 at 19:39
  • That worked and removed all the duplicates without keeping at least one pair. But can you suggest why drop_duplicates is not working? Commented Mar 1, 2020 at 19:40
  • 1
    That's because, in both rows, the values of src and trg are not the same. When you use the subset, it looks for duplicates in the entire subset. Commented Mar 1, 2020 at 19:41
  • 1
    Drop_duplicates works for columns. That is if you have another row with B C as source and target, that row will be dropped. Commented Mar 1, 2020 at 19:42
  • OK. That was a very subtle point I missed. Thank you again. Commented Mar 1, 2020 at 19:43

1 Answer 1

1

TO remove the duplicate, you can refer to the following example which I have solved on pyNb enter image description here

Or use df = df[df['src'] != df['trg']]

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.