2

To select a dataframe row I use :

dataframe[dataframe['column_name'] == 'column_value']

I'm attempting to discover the performance of this code but cannot find where the API defines the == method in context of finding a row.

How to find the source of == operator in order to determine above line of code performance ?

I assume it's located in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html but unsure how to search ?

1 Answer 1

1

Firstly, note that the == operator calls __eq__ in the background. In your context it relates to a pd.Series object rather than pd.DataFrame. So these are equivalent:

res = pd.Series([1, 2, 3]) == 1
res = pd.Series([1, 2, 3]).__eq__(1)

You can then investigate this method:

pd.Series.__eq__
<function pandas.core.ops._comp_method_SERIES.<locals>.wrapper>

Further, you can investigate _comp_method_SERIES in ops.py:

def _comp_method_SERIES(op, name, str_rep, masker=False):
    """
    Wrapper function for Series arithmetic operations, to avoid
    code duplication.
    """
    ....

This should get you started. There are utility functions defined, e.g. to deal with null values, which indicate the value that Pandas adds for data manipulation. Optimized operations may end up calling C-level algorithms, which may make tracking performance issues difficult in pure Python.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.