0

I have a dataframe as below and I want to find how many times value from the column Jan occurs in the column URL and corresponding cell of the column URL.

I want to create 3 columns - found in cell and found in column and distinct finds For example when we search for value try from the first cell of the column Jan, it should return 1 in found in cell and 2 in 'found in columnand 2 indistinct findsbecause the word was found in 2 rows when we search for valuewhyfrom the second cell of the columnJan, it should return 0 infound in celland 2 in 'found in column and 2 in distinct finds because the word was found in 2 rows

i know how to search within a string. But how could i search within a cell and within a column?

s="ea2017-104.pdf bb cc for why"
s.lower().count("why")#to find text within string

sales = [{'account': '3', 'Jan': 'try', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'why', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['column_find']=df['URL'].str.lower().count('why')

final output will have 3 additional columns as below

found_inCell    found_in_column           distinct_finds
2                3                   2
0                2                   2
0                1                   1

update

I get an error when i try to run code when one of the cells in empty/np.nan

sales = [{'account': '3', 'Jan': np.nan, 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'try', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

1 Answer 1

2

Here is one way.

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

#           Feb    Jan                           URL account  found_inCell  \
# 0  200 .jones    try  ea2018-001.pdf try bbbbb why       3             1   
# 1         210    why                          try        1             0   
# 2          90  bbbbb  ea2017-104.pdf bb cc for why       2             0   

#    found_in_column  distinct_finds  
# 0                2               2  
# 1                2               2  
# 2                1               1  
Sign up to request clarification or add additional context in comments.

2 Comments

thanks...it is working...let me do a bit more testing
i get an error 'must be str, not NoneType' if a cell in column Jan is empty...how could i modify the apply function such that it just skips row when the value in the cell is empty?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.