-
-
Couldn't load subscription status.
- Fork 19.2k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations
Milestone
Description
When working with external data, I often see rows with primary key violations. Currently, I could not easily select all the violating rows. For example, if I have a massive file with some inconsistent data
datecol,valuecol
...
2014-01-01,12
2014-01-01,13
2014-01-02,10
...
In this use case, it would be good if we can do df[df.duplicated('datecol', take_all=True)] to directly get the bad rows
2014-01-01,12
2014-01-01,13
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations