Find and Assign Values to columns in Pandas

Question

I have the following DataFrame of Movie genres of str type

    genre
0   8.3/10Family Action & Adventure ...More Genres...
1   8.6/10Fantasy Anime ...More GenresFantasyAnime...
2   8.7/10Science-Fiction Action & Adventure Rated...
3   8.1/10Family Action & Adventure ...More Genres...
4   8.4/10Science-Fiction Family ...More GenresSci..

and I'd like to filter out the list of genres genres = ['Family', 'Action & Adventure', 'Fantasy'] into a new column

    genre
0   Family, Action & Adventure
1   Fantasy, Anime
2   Science-Fiction, Action & Adventure
3   Family Action, Adventure
4   Science-Fiction, Family

Please Advise.

Instead of ...More Genres... please provide a reproducible dataset where people can do something and test also their code — Epsi95
– Epsi95, Commented Feb 25, 2021 at 9:50

jezrael · Accepted Answer · 2021-02-25 10:01:14Z

1

Use Series.str.findall with values of list joined by | for regex or and then Series.str.join, for remove duplicates is use convert lists to sets by .apply(set):

df['genre'] = df['genre'].str.findall('|'.join(genres)).apply(set).str.join(',')

edited Feb 25, 2021 at 10:01

answered Feb 25, 2021 at 9:50

jezrael

868k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

The Singularity Over a year ago

Gets most of the job done, but for some reason it returns the same value twice

The Singularity Over a year ago

For example if Family is found it returns Family,Family

jezrael Over a year ago

@Luke - yop, reason are possible duplicates, if use .apply(set) remove dupes

Collectives™ on Stack Overflow

Find and Assign Values to columns in Pandas

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related