0

I am trying to write a simple scraper for Sales Navigator in Linkedin and this is the link I am trying to scrape . It has search results for specific filter options selected for account results.

The goal I am trying to achieve is to retrieve every company name among the search results. Upon inspecting the link elements carrying the company name (eg : Facile.it, AGT international), I see the following js script, showing the dt class name

    <dt class="result-lockup__name">
    <a id="ember208" href="/sales/company/2429831?_ntb=zreYu57eQo%2BSZiFskdWJqg%3D%3D" class="ember-view">  Facile.it

    </a>    </dt>

I basically want to retrieve those names and open the url represented in href.

It can be noted that all the company name links had the same dt class result-lockup__name. The following script is an attempt to collect the list of all company names displayed in the search result along with its elements.

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from bs4 import BeautifulSoup
    import re
    import pandas as pd
    import os

    def scrape_accounts(url):

        url = "https://www.linkedin.com/sales/search/companycompanySize=E&geoIncluded=emea%3A0%2Ceurope%3A0&industryIncluded=6&keywords=AI&page=1&searchSessionId=zreYu57eQo%2BSZiFskdWJqg%3D%3D"
        driver = webdriver.PhantomJS(executable_path='C:\\phantomjs\\bin\\phantomjs.exe')
        #driver = webdriver.Firefox()
        #driver.implicitly_wait(30)
        driver.get(url)

        search_results = []
        search_results = driver.find_elements_by_class_name("result-lockup__name")
        print(search_results)

    if __name__ == "__main__":

        scrape_accounts("lol")

however, the result prints an empty list. I am trying to learn scraping different parts of web page and different elements,and thus I am not sure if I got this correct. What would be the right way?

1 Answer 1

1

I'm afraid I can't get to the page that you're after, but I notice that you're importing beautiful soup but not using it.

Try:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
import pandas as pd
import os

url = "https://www.linkedin.com/sales/search/companycompanySize=E&geoIncluded=emea%3A0%2Ceurope%3A0&industryIncluded=6&keywords=AI&page=1&searchSessionId=zreYu57eQo%2BSZiFskdWJqg%3D%3D"

def scrape_accounts(url = url):

    driver = webdriver.PhantomJS(executable_path='C:\\phantomjs\\bin\\phantomjs.exe')
    #driver = webdriver.Firefox()
    #driver.implicitly_wait(30)
    driver.get(url)

    html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')

    soup = BeautifulSoup(html, 'html.parser')
    search_results = soup.select('dt.result-lockup__name a')
    for link in search_results:
        print(link.text.strip(), link['href'])
Sign up to request clarification or add additional context in comments.

9 Comments

my apologies. I notice that I logged in using my id and password in that account.
I believe the first error lies at the fact that the browser session started doesn't login
I had issues with sign in as well, as posted in this question stackoverflow.com/questions/55128400/…
ah yes, answer adjusted, apologies
Let's do this in the chat you started?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.