I have a dataframe as below and I want to find how many times value from the column Jan occurs in the column URL and corresponding cell of the column URL.
I want to create 3 columns - found in cell and found in column and distinct finds
For example when we search for value try from the first cell of the column Jan, it should return 1 in found in cell and 2 in 'found in columnand 2 indistinct findsbecause the word was found in 2 rows
when we search for valuewhyfrom the second cell of the columnJan, it should return 0 infound in celland 2 in 'found in column and 2 in distinct finds because the word was found in 2 rows
i know how to search within a string. But how could i search within a cell and within a column?
s="ea2017-104.pdf bb cc for why"
s.lower().count("why")#to find text within string
sales = [{'account': '3', 'Jan': 'try', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
{'account': '1', 'Jan': 'why', 'Feb': '210', 'URL': 'try '},
{'account': '2', 'Jan': 'bbbbb', 'Feb': '90', 'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df
df['column_find']=df['URL'].str.lower().count('why')
final output will have 3 additional columns as below
found_inCell found_in_column distinct_finds
2 3 2
0 2 2
0 1 1
update
I get an error when i try to run code when one of the cells in empty/np.nan
sales = [{'account': '3', 'Jan': np.nan, 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
{'account': '1', 'Jan': 'try', 'Feb': '210', 'URL': 'try '},
{'account': '2', 'Jan': 'bbbbb', 'Feb': '90', 'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df
df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))