-
-
Couldn't load subscription status.
- Fork 19.2k
Closed
Labels
Description
This was fun to debug.
In [1]: import pandas as pd
In [2]: 0 in pd.Int64Index([0, 0, 1])
Out[2]: True
In [3]: 0 in pd.Int64Index([0, 1, 0])
Out[3]: True
In [4]: 0 in pd.Int64Index([0, 0, -1])
Out[4]: True
In [5]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, -1])
Out[5]: True
In [6]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, 0])
Out[6]: False # BAD
In [7]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, 1])
Out[7]: True
In [8]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, -1])
Out[8]: False # BADTimedeltaIndex is also broken.
The problem is in DatetimeIndexOpsMixin.__contains__, which checks the type of idx.get_loc(key) to determine whether the key was found in the index. If the index contains duplicate entries and is not monotonic increasing (for some reason, monotonic decreasing doesn't cut it), get_loc eventually falls back to Int64Engine._maybe_get_bool_indexer, which returns an ndarray of bools if the key is duplicated. Since the original __contains__ method is looking for scalars or slices, it reports that the duplicated entry is not present.