Python pandas dataframe: Looping through each row, if condition is true, update column

Question

I have a CSV that has a list of URLs that I need to see if they exist in other columns. I have the following code that loops through each row of the column called "URLS" that checks to see if this exists on another specific column. If this does, then I need to add a string to a specific column for the row. I have it functioning, but I'm not sure how I update the column for the row? I'm reading through the docs and I'm thinking I might be over thinking a bit on this.

import pandas as pd

# Import CSV
data = pd.read_csv(r'URL_export.csv')

# Looping through the URLS of this column
df = pd.DataFrame(data, columns = ['URL', 'Exists'])

# Checking if URLs exist in this row
v = pd.DataFrame(data, columns = ['Check'])

for row in df.itertuples():
    if row.URL in v.Check.values:
        print(row)
        # Add string "Yes" under column name "Exists" for this row

Please provide a minimal reproducible example, as well as the current and expected output. — AMC
– AMC, Commented Oct 18, 2020 at 2:02
Sorry, I'm not sure I'm following. Let me try explaining it this way: The posted code works, I'm just not sure how I can modify a column these rows. For example, the printed row returns: Pandas(Index=11, URL='name_of_url_page.html', Check=nan). For each of these, I'd like to change the data inside the "Check" column, but I'm not entirely sure what method I'd go about doing this? — jedd117
– jedd117, Commented Oct 18, 2020 at 2:14
I made a typo and Meant to call if row.URL in v.Check.values: instead of if row.URL in v.CheckLight.values:. I adjusted that, and tried what you posted but it didn't do anything it seemed. I checked the documentation and it looks like it might work, I might need to refactor my code a bit. I'll work with it and see what I can come up with. — jedd117
– jedd117, Commented Oct 18, 2020 at 2:51
I''ve looked at the documentation and saw that df.loc might work. I ran df.loc[[row.Index], ["Exists"]] = "Yes" in my if statement but it's not updating the column for these rows either. However, when I print df.loc[[row.Index]], it returns the index and columns, which makes me think this should be working according to the documentation. link — jedd117
– jedd117, Commented Oct 18, 2020 at 3:36

Alexandra Dudkina · Accepted Answer · 2020-10-18 08:40:55Z

import pandas as pd

df = pd.DataFrame({
    'URL': ['a', 'b', 'c' ,'d', 'e', 'f'],
    'Exists': ['','','', '', '', '']
})

v = pd.DataFrame({
    'Check': ['a', 'c', 'e']
})

df['Exists'] = df['URL'].apply(lambda x: 'Yes' if x in v['Check'].values else 'No')

Output:

If it's needed just assign "Yes" (without "No"):

df['Exists'] = df['Exists'] + ' ' + df['URL'].apply(lambda x: 'Yes' if x in v['Check'].values else '')

If column "Exists" already contains a value and you need to append "Yes" to it:

df['Exists'] = df['Exists'] + ' ' + df['URL'].apply(lambda x: 'Yes' if x in v['Check'].values else '')

AMC · Accepted Answer · 2020-10-20 01:38:39Z

It's probably better to use booleans, instead of the strings 'Yes' and 'No'.

This also helps simplify the code:

import pandas as pd

df_1 = pd.DataFrame({'URL': ['a', 'b', 'd', 'c', 'e', 'f']})
print(df_1, end='\n\n')

df_2 = pd.DataFrame({'Check': ['a', 'c', 'e']})
print(df_2, end='\n\n')

df_1['Exists'] = df_1['URL'].isin(df_2['Check'])
print(df_1)

Output:

  URL
0   a
1   b
2   d
3   c
4   e
5   f

  Check
0     a
1     c
2     e

  URL  Exists
0   a    True
1   b   False
2   d   False
3   c    True
4   e    True
5   f   False

Collectives™ on Stack Overflow

Python pandas dataframe: Looping through each row, if condition is true, update column

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related