1

I have the following DataFrame of Movie genres of str type

    genre
0   8.3/10Family Action & Adventure ...More Genres...
1   8.6/10Fantasy Anime ...More GenresFantasyAnime...
2   8.7/10Science-Fiction Action & Adventure Rated...
3   8.1/10Family Action & Adventure ...More Genres...
4   8.4/10Science-Fiction Family ...More GenresSci..

and I'd like to filter out the list of genres genres = ['Family', 'Action & Adventure', 'Fantasy'] into a new column

    genre
0   Family, Action & Adventure
1   Fantasy, Anime
2   Science-Fiction, Action & Adventure
3   Family Action, Adventure
4   Science-Fiction, Family

Please Advise.

1
  • Instead of ...More Genres... please provide a reproducible dataset where people can do something and test also their code Commented Feb 25, 2021 at 9:50

1 Answer 1

1

Use Series.str.findall with values of list joined by | for regex or and then Series.str.join, for remove duplicates is use convert lists to sets by .apply(set):

df['genre'] = df['genre'].str.findall('|'.join(genres)).apply(set).str.join(',')
Sign up to request clarification or add additional context in comments.

3 Comments

Gets most of the job done, but for some reason it returns the same value twice
For example if Family is found it returns Family,Family
@Luke - yop, reason are possible duplicates, if use .apply(set) remove dupes

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.