pandas.DataFrame.duplicated to allow take_all

When working with external data, I often see rows with primary key violations. Currently, I could not easily select all the violating rows. For example, if I have a massive file with some inconsistent data

datecol,valuecol
...
2014-01-01,12
2014-01-01,13
2014-01-02,10
...

In this use case, it would be good if we can do df[df.duplicated('datecol', take_all=True)] to directly get the bad rows

2014-01-01,12
2014-01-01,13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

pandas.DataFrame.duplicated to allow take_all #6511

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

pandas.DataFrame.duplicated to allow take_all #6511

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions