0

I have a CSV that has a list of URLs that I need to see if they exist in other columns. I have the following code that loops through each row of the column called "URLS" that checks to see if this exists on another specific column. If this does, then I need to add a string to a specific column for the row. I have it functioning, but I'm not sure how I update the column for the row? I'm reading through the docs and I'm thinking I might be over thinking a bit on this.

import pandas as pd

# Import CSV
data = pd.read_csv(r'URL_export.csv')

# Looping through the URLS of this column
df = pd.DataFrame(data, columns = ['URL', 'Exists'])

# Checking if URLs exist in this row
v = pd.DataFrame(data, columns = ['Check'])

for row in df.itertuples():
    if row.URL in v.Check.values:
        print(row)
        # Add string "Yes" under column name "Exists" for this row
5
  • Please provide a minimal reproducible example, as well as the current and expected output. Commented Oct 18, 2020 at 2:02
  • Sorry, I'm not sure I'm following. Let me try explaining it this way: The posted code works, I'm just not sure how I can modify a column these rows. For example, the printed row returns: Pandas(Index=11, URL='name_of_url_page.html', Check=nan). For each of these, I'd like to change the data inside the "Check" column, but I'm not entirely sure what method I'd go about doing this? Commented Oct 18, 2020 at 2:14
  • Try df['URL'].isin(v['CheckLight']) ? Commented Oct 18, 2020 at 2:32
  • I made a typo and Meant to call if row.URL in v.Check.values: instead of if row.URL in v.CheckLight.values:. I adjusted that, and tried what you posted but it didn't do anything it seemed. I checked the documentation and it looks like it might work, I might need to refactor my code a bit. I'll work with it and see what I can come up with. Commented Oct 18, 2020 at 2:51
  • I''ve looked at the documentation and saw that df.loc might work. I ran df.loc[[row.Index], ["Exists"]] = "Yes" in my if statement but it's not updating the column for these rows either. However, when I print df.loc[[row.Index]], it returns the index and columns, which makes me think this should be working according to the documentation. link Commented Oct 18, 2020 at 3:36

2 Answers 2

2
import pandas as pd

df = pd.DataFrame({
    'URL': ['a', 'b', 'c' ,'d', 'e', 'f'],
    'Exists': ['','','', '', '', '']
})

v = pd.DataFrame({
    'Check': ['a', 'c', 'e']
})

df['Exists'] = df['URL'].apply(lambda x: 'Yes' if x in v['Check'].values else 'No')

Output:

Output

If it's needed just assign "Yes" (without "No"):

df['Exists'] = df['Exists'] + ' ' + df['URL'].apply(lambda x: 'Yes' if x in v['Check'].values else '')

If column "Exists" already contains a value and you need to append "Yes" to it:

df['Exists'] = df['Exists'] + ' ' + df['URL'].apply(lambda x: 'Yes' if x in v['Check'].values else '')
Sign up to request clarification or add additional context in comments.

Comments

0

It's probably better to use booleans, instead of the strings 'Yes' and 'No'.

This also helps simplify the code:

import pandas as pd

df_1 = pd.DataFrame({'URL': ['a', 'b', 'd', 'c', 'e', 'f']})
print(df_1, end='\n\n')

df_2 = pd.DataFrame({'Check': ['a', 'c', 'e']})
print(df_2, end='\n\n')

df_1['Exists'] = df_1['URL'].isin(df_2['Check'])
print(df_1)

Output:

  URL
0   a
1   b
2   d
3   c
4   e
5   f

  Check
0     a
1     c
2     e

  URL  Exists
0   a    True
1   b   False
2   d   False
3   c    True
4   e    True
5   f   False

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.