Python Web Scraper - Issue grabbing links from href

Question

I've been following along this guide to web scraping LinkedIn and google searches. There have been some changes in the HTML of google's search results since the guide was created so I've had to tinker with the code a bit. I'm at the point where I need to grab the links from the search results but have run into an issue where the program doesn't return anything even after implementing a code fix from this post due to an error. I'm not sure what I'm doing wrong here.

import Parameters
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from parsel import Selector
import csv

# defining new variable passing two parameters
writer = csv.writer(open(Parameters.file_name, 'w'))

# writerow() method to the write to the file object
writer.writerow(['Name', 'Job Title', 'Company', 'College', 'Location', 'URL'])

# specifies the path to the chromedriver.exe
driver = webdriver.Chrome('/Users/.../Python Scripts/chromedriver')
driver.get('https://www.linkedin.com')
sleep(0.5)

# locate email form by_class_name then send_keys() to simulate key strokes
username = driver.find_element_by_id('session_key')
username.send_keys(Parameters.linkedin_username)
sleep(0.5)

password = driver.find_element_by_id('session_password')
password.send_keys(Parameters.linkedin_password)
sleep(0.5)

sign_in_button = driver.find_element_by_class_name('sign-in-form__submit-button')
sign_in_button.click()
sleep(3)

driver.get('https:www.google.com')
sleep(3)

search_query = driver.find_element_by_name('q')
search_query.send_keys(Parameters.search_query)
sleep(0.5)

search_query.send_keys(Keys.RETURN)
sleep(3)

################# HERE IS WHERE THE ISSUE LIES ######################
#linkedin_urls = driver.find_elements_by_class_name('iUh30')
linkedin_urls = driver.find_elements_by_css_selector("yuRUbf > a")
for url_prep in linkedin_urls:
    url_prep.get_attribute('href')
#linkedin_urls = [url.text for url in linkedin_urls]
sleep(0.5)

print('Supposed to be URLs')
print(linkedin_urls)

The search parameter is

search_query = 'site:linkedin.com/in/ AND "python developer" AND "London"'

Results in an empty list:

Snippet of the HTML section I want to grab:

EDIT: This is the output if I go by .find_elements_by_class_name or by Sector97's 1st edits.

Sector97 · Accepted Answer · 2021-03-04 01:14:38Z

1

Found an alternative solution that might make it a bit easier to achieve what you're after. Credit to A.Pond at https://stackoverflow.com/a/62050505

Use the google search api to get the links from the results. You may need to install the library first

pip install google

You can then use the api to quickly extract an arbitrary number of links:

from googlesearch import search

links = []
query = 'site:linkedin.com/in AND "python developer" AND "London"'
for j in search(query, tld = 'com',start = 0,stop = 100,pause=4): 
    links.append(j)

I got the first 100 results but you can play around with the parameters to get more or less as you need.

You can see more about this api here: https://www.geeksforgeeks.org/performing-google-search-using-python-code/

edited Mar 4, 2021 at 1:14

answered Mar 3, 2021 at 22:30

Sector97

11610 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

plankton Over a year ago

Good work!!! Wow. Your update to the last answer worked but this looks like the better option. Much faster and cleaner.

plankton Over a year ago

I have a new question that I posted and tried to tag you on but it didn't seem to go through. Would appreciate your help if you have time! I think this is the last or 2nd to last hurdle before I finish the project. stackoverflow.com/questions/66535833/…

Sector97 Over a year ago

No problem I'll take a look at it a little later today

Sector97 Over a year ago

Sorry, didn't see this till now, it looks like you've already got an answer on your post. If the answer doesn't resolve what you're after let me know and I'll take a look

Sector97 · Accepted Answer · 2021-03-03 22:47:37Z

1

I think I found the error in your code. Instead of using

linkedin_urls = driver.find_elements_by_css_selector("yuRUbf > a")

Try this instead:

web_elements = driver.find_elements_by_class_name("yuRUbf")

That gets you the parent elements. You can then extract the url text using a simple list comprehension:

linkedin_urls = [elem.find_element_by_css_selector('a').get_attribute('href') for elem in web_elements]

edited Mar 3, 2021 at 22:47

answered Mar 3, 2021 at 3:46

Sector97

11610 bronze badges

2 Comments

plankton Over a year ago

Thanks for taking a shot at it. Unfortunately, the output I got was similar to going by .find_elements_by_class_name where it printed the titles and descriptive info but not the direct LinkedIn url found in href. You can check my edit for a screenshot.

Sector97 Over a year ago

I think I've fixed the above code let me know if it works for you now

Collectives™ on Stack Overflow

Python Web Scraper - Issue grabbing links from href

2 Answers 2

4 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Linked

Related