Filtering a pandas df with any of the list values [duplicate]

Question

I have a pandas dataframe:

df
0       PL
1       PL
2       PL
3       IT
4       IT
        ..
4670    DE
4671    NO
4672    MT
4673    FI
4674    XX
Name: country_code, Length: 4675, dtype: object

I am filtering this by germany country tag 'DE' via:

df = df[df.apply(lambda x: 'DE' in x)]

If I would like to filter with more countries than I have to add them manually via: .apply(lambda x: 'DE' in x or 'GB' in x). However I would like to create a countries list and generate this statement automaticly.

Something like this:

countries = ['DE', 'GB', 'IT']
df = df[df.apply(lambda x: any_item_in_countries_list in x)]

I think I can filter df 3 times and then merge these pieces back via concat(), however is there a more generic function to achieve this?

marc_s · Accepted Answer · 2021-12-18 20:11:51Z

2

You can use .isin():

df[df['country_code'].isin(['DE', 'GB', 'IT'])]

Performance comparison:

import timeit
import pandas as pd
df = pd.DataFrame({'country_code': ['DE', 'GB', 'IT', 'MT', 'FI', 'XX'] * 1000})

%timeit df[df['country_code'].isin(['DE', 'GB', 'IT'])]
409 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['country_code'].apply(lambda x: x in ['DE', 'AT', 'GB'])
1.35 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Dec 18, 2021 at 20:11

marc_s

760k185 gold badges1.4k silver badges1.5k bronze badges

answered Sep 5, 2021 at 14:01

Andreas

9,2653 gold badges20 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

oakca Over a year ago

.apply(lambda x: x in ['DE', 'AT', 'GB']), also work do you want to do a benchmark?

Andreas Over a year ago

@oakca, apply is considered a bottleneck operation, I can make a comparisson, but I don't thinky apply can beat many standard methods of pandas.

oakca Over a year ago

I will accept, but for documentation reasons, if you make a easy benchmark and show it in your answer would be good.

Andreas Over a year ago

@oakca added performance check.

Sabil · Accepted Answer · 2021-09-05 14:05:06Z

1

If you have column names the you can try this

countries = ['DE', 'GB', 'IT']
df[df['country_code'].isin(countries)]

edited Sep 5, 2021 at 14:05

answered Sep 5, 2021 at 14:01

Sabil

4,5391 gold badge9 silver badges21 bronze badges

3 Comments

Andreas Over a year ago

He showed a pd.Series, you can see the name of the series (or column name) at the bottom of his example.

oakca Over a year ago

.apply(lambda x: x in ['DE', 'AT', 'GB']), also work do you want to do a benchmark?

Sabil Over a year ago

Oakca, you can do that using colab and py calling %timeit on the filter line

Collectives™ on Stack Overflow

Filtering a pandas df with any of the list values [duplicate]

2 Answers 2

4 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Linked

Related