3

I want to find the name of the column in a dataframe ("categories") that contains a given string.

categories

    Groceries   Electricity Fastfood    Parking 
0   SHOP        ELCOMPANY   MCDONALDS   park
1   MARKET      ELECT       Subway      car
2   market      electr      Restauran   247 

Say I want to search this entire dataframe for string "MCDO". The answer should be "Fastfood". I tried using str.contains but it doesn't seem to work for dataframes.

How can I achieve this? Thank you.

4 Answers 4

6

If you can search for the entire string, it makes it easier,

(df == 'MCDONALDS').any().idxmax()

else use apply,

df.apply(lambda x: x.str.startswith('MCDO').any()).idxmax()
Sign up to request clarification or add additional context in comments.

3 Comments

I like the first one a lot - it's short and works perfectly, showing just "Fastfood" instead of "Index(['Fastfood'], dtype='object')" from the earlier attempt lambda attempt.
What was the first function? You started with a "("
@christfan868, the best way to understand the code would be by breaking it down. df == 'MCDONALDS' returns a boolean df, (df == 'MCDONALDS').any() returns True along axis 0 if the condition is True for any value (hence the bracket)
2

You can check with contains with any

df.apply(lambda x : x.str.contains('MCDO')).any().loc[lambda x : x].index
Index(['Fastfood'], dtype='object')

2 Comments

Lol, :-), :D, YW
Note that: df.apply(lambda x : x.str.contains('MCDO')).any().loc[lambda x : x].index.item() would only return the column name.
2

Or use:

print(df.apply(lambda x: x.str.contains('MCDO')).replace(False,np.nan).dropna(axis=1,how='all').columns.item())

Output:

Fastfood

Comments

2

One can also use for loop for this:

def strfinder(df, mystr):
    for col in df:
        for item in df[col]:
            if mystr in item:
                return col

print(strfinder(df, 'MCDO'))

To get all columns that may have the string, e.g. in modified dataframe below:

    Groceries   Electricity  Fastfood    Parking 
0   SHOP        ELCOMPANY   MCDONALDS   park
1   MARKET      MCDON       Subway      car
2   market      electr      Restauran   247 

one can use "list comprehension":

mystr = 'MCDO'
outlist = [ col 
            for col in df 
            for item in df[col]
            if mystr in item    ]
print(outlist)

Output:

['Electricity', 'Fastfood']

2 Comments

Thank you, looks quite nice although people tend to say using for loops with Pandas is always wrong. But if it works, it works.
They are easy to understand and fast enough for most purposes. However, if your data is large, more optimized approaches should be used.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.