0

I've been following along this guide to web scraping LinkedIn and google searches. There have been some changes in the HTML of google's search results since the guide was created so I've had to tinker with the code a bit. I'm at the point where I need to grab the links from the search results but have run into an issue where the program doesn't return anything even after implementing a code fix from this post due to an error. I'm not sure what I'm doing wrong here.

import Parameters
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from parsel import Selector
import csv

# defining new variable passing two parameters
writer = csv.writer(open(Parameters.file_name, 'w'))

# writerow() method to the write to the file object
writer.writerow(['Name', 'Job Title', 'Company', 'College', 'Location', 'URL'])

# specifies the path to the chromedriver.exe
driver = webdriver.Chrome('/Users/.../Python Scripts/chromedriver')
driver.get('https://www.linkedin.com')
sleep(0.5)

# locate email form by_class_name then send_keys() to simulate key strokes
username = driver.find_element_by_id('session_key')
username.send_keys(Parameters.linkedin_username)
sleep(0.5)

password = driver.find_element_by_id('session_password')
password.send_keys(Parameters.linkedin_password)
sleep(0.5)

sign_in_button = driver.find_element_by_class_name('sign-in-form__submit-button')
sign_in_button.click()
sleep(3)

driver.get('https:www.google.com')
sleep(3)

search_query = driver.find_element_by_name('q')
search_query.send_keys(Parameters.search_query)
sleep(0.5)

search_query.send_keys(Keys.RETURN)
sleep(3)

################# HERE IS WHERE THE ISSUE LIES ######################
#linkedin_urls = driver.find_elements_by_class_name('iUh30')
linkedin_urls = driver.find_elements_by_css_selector("yuRUbf > a")
for url_prep in linkedin_urls:
    url_prep.get_attribute('href')
#linkedin_urls = [url.text for url in linkedin_urls]
sleep(0.5)

print('Supposed to be URLs')
print(linkedin_urls)

The search parameter is

search_query = 'site:linkedin.com/in/ AND "python developer" AND "London"'

Results in an empty list: empty results list

Snippet of the HTML section I want to grab: google search html

EDIT: This is the output if I go by .find_elements_by_class_name or by Sector97's 1st edits. Descriptive info but no URL

2 Answers 2

1

Found an alternative solution that might make it a bit easier to achieve what you're after. Credit to A.Pond at https://stackoverflow.com/a/62050505

Use the google search api to get the links from the results. You may need to install the library first

pip install google

You can then use the api to quickly extract an arbitrary number of links:

from googlesearch import search

links = []
query = 'site:linkedin.com/in AND "python developer" AND "London"'
for j in search(query, tld = 'com',start = 0,stop = 100,pause=4): 
    links.append(j)

I got the first 100 results but you can play around with the parameters to get more or less as you need.

You can see more about this api here: https://www.geeksforgeeks.org/performing-google-search-using-python-code/

Sign up to request clarification or add additional context in comments.

4 Comments

Good work!!! Wow. Your update to the last answer worked but this looks like the better option. Much faster and cleaner.
I have a new question that I posted and tried to tag you on but it didn't seem to go through. Would appreciate your help if you have time! I think this is the last or 2nd to last hurdle before I finish the project. stackoverflow.com/questions/66535833/…
No problem I'll take a look at it a little later today
Sorry, didn't see this till now, it looks like you've already got an answer on your post. If the answer doesn't resolve what you're after let me know and I'll take a look
1

I think I found the error in your code. Instead of using

linkedin_urls = driver.find_elements_by_css_selector("yuRUbf > a")

Try this instead:

web_elements = driver.find_elements_by_class_name("yuRUbf")

That gets you the parent elements. You can then extract the url text using a simple list comprehension:

linkedin_urls = [elem.find_element_by_css_selector('a').get_attribute('href') for elem in web_elements]

2 Comments

Thanks for taking a shot at it. Unfortunately, the output I got was similar to going by .find_elements_by_class_name where it printed the titles and descriptive info but not the direct LinkedIn url found in href. You can check my edit for a screenshot.
I think I've fixed the above code let me know if it works for you now

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.