finding a string in pandas dataframe column and cell

Question

I have a dataframe as below and I want to find how many times value from the column Jan occurs in the column URL and corresponding cell of the column URL.

I want to create 3 columns - found in cell and found in column and distinct finds For example when we search for value try from the first cell of the column Jan, it should return 1 in found in cell and 2 in 'found in columnand 2 indistinct findsbecause the word was found in 2 rows when we search for valuewhyfrom the second cell of the columnJan, it should return 0 infound in celland 2 in 'found in column and 2 in distinct finds because the word was found in 2 rows

i know how to search within a string. But how could i search within a cell and within a column?

s="ea2017-104.pdf bb cc for why"
s.lower().count("why")#to find text within string

sales = [{'account': '3', 'Jan': 'try', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'why', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['column_find']=df['URL'].str.lower().count('why')

final output will have 3 additional columns as below

found_inCell    found_in_column           distinct_finds
2                3                   2
0                2                   2
0                1                   1

update

I get an error when i try to run code when one of the cells in empty/np.nan

sales = [{'account': '3', 'Jan': np.nan, 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'try', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

jpp · Accepted Answer · 2018-02-21 23:08:47Z

2

Here is one way.

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

#           Feb    Jan                           URL account  found_inCell  \
# 0  200 .jones    try  ea2018-001.pdf try bbbbb why       3             1   
# 1         210    why                          try        1             0   
# 2          90  bbbbb  ea2017-104.pdf bb cc for why       2             0   

#    found_in_column  distinct_finds  
# 0                2               2  
# 1                2               2  
# 2                1               1

answered Feb 21, 2018 at 23:08

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ni_Tempe Over a year ago

thanks...it is working...let me do a bit more testing

Ni_Tempe Over a year ago

i get an error 'must be str, not NoneType' if a cell in column Jan is empty...how could i modify the apply function such that it just skips row when the value in the cell is empty?

Collectives™ on Stack Overflow

finding a string in pandas dataframe column and cell

update

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

update

1 Answer 1

2 Comments

Related