4

I have a pandas dataframe:

df
0       PL
1       PL
2       PL
3       IT
4       IT
        ..
4670    DE
4671    NO
4672    MT
4673    FI
4674    XX
Name: country_code, Length: 4675, dtype: object

I am filtering this by germany country tag 'DE' via:

df = df[df.apply(lambda x: 'DE' in x)]

If I would like to filter with more countries than I have to add them manually via: .apply(lambda x: 'DE' in x or 'GB' in x). However I would like to create a countries list and generate this statement automaticly.

Something like this:

countries = ['DE', 'GB', 'IT']
df = df[df.apply(lambda x: any_item_in_countries_list in x)]

I think I can filter df 3 times and then merge these pieces back via concat(), however is there a more generic function to achieve this?

0

2 Answers 2

2

You can use .isin():

df[df['country_code'].isin(['DE', 'GB', 'IT'])]

Performance comparison:

import timeit
import pandas as pd
df = pd.DataFrame({'country_code': ['DE', 'GB', 'IT', 'MT', 'FI', 'XX'] * 1000})

%timeit df[df['country_code'].isin(['DE', 'GB', 'IT'])]
409 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['country_code'].apply(lambda x: x in ['DE', 'AT', 'GB'])
1.35 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sign up to request clarification or add additional context in comments.

4 Comments

.apply(lambda x: x in ['DE', 'AT', 'GB']), also work do you want to do a benchmark?
@oakca, apply is considered a bottleneck operation, I can make a comparisson, but I don't thinky apply can beat many standard methods of pandas.
I will accept, but for documentation reasons, if you make a easy benchmark and show it in your answer would be good.
@oakca added performance check.
1

If you have column names the you can try this

countries = ['DE', 'GB', 'IT']
df[df['country_code'].isin(countries)]

3 Comments

He showed a pd.Series, you can see the name of the series (or column name) at the bottom of his example.
.apply(lambda x: x in ['DE', 'AT', 'GB']), also work do you want to do a benchmark?
Oakca, you can do that using colab and py calling %timeit on the filter line

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.