Web scraping using Selenium and chromedriver in Python

Question

I am looking at this page. I am trying to use Selenium and chromdriver to scrape this data (shown by the red marker):

Here is my Python code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("disable-infobars")
driver = webdriver.Chrome(executable_path="/ABC/chromedriver", chrome_options=chrome_options)

driver.get("https://finance.yahoo.com/quote/IBM")
sleep(10)
estimated = driver.find_element_by_class_name("IbBox Ta(start) C($tertiaryColor)")

But the code does not get the Est. Return and after a long wait it returns this error message:

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

What am I doing wrong and what is the best and fastest way to get the Est Return value from the page?

UPDATE: Here is what I see if I use inspect element in Chrome:

SIM · Accepted Answer · 2020-03-21 23:45:35Z

1

Header plays an important role to fetch the value you are after, so make sure you have one. Given that this is how you get the desired content.

import requests
from bs4 import BeautifulSoup

link = "https://finance.yahoo.com/quote/IBM"

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}

r = requests.get(link,headers=headers)
soup = BeautifulSoup(r.text,"lxml")
est_return = soup.select_one("[class='Mb\(8px\)']").get_text()
print(est_return)

answered Mar 21, 2020 at 23:45

SIM

22.4k6 gold badges45 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

TJ1 Over a year ago

Thanks, it works nicely. How did you find the class? How do you know it should be Mb\(8px\)?

SIM Over a year ago

Class name containing braces should be escaped. The backslash ( \ ) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character.

TJ1 Over a year ago

Can you please elaborate, I am not too familiar with this.

TJ1 Over a year ago

For example how can you extract Near Fair Value ?

SIM Over a year ago

Try this soup.select_one("[class='Mb\(8px\)']").find_previous_sibling().get_text()

|

EnriqueBet · Accepted Answer · 2020-03-21 23:43:01Z

0

Can you try with XPath instead, it should look like this:

estimated = driver.find_element_by_xpath("*//div[@class='IbBox Ta(start) C($tertiaryColor)']").text()

Let me know how does it go! :D

answered Mar 21, 2020 at 23:43

EnriqueBet

1,4742 gold badges15 silver badges23 bronze badges

Comments

undetected Selenium · Accepted Answer · 2020-03-22 20:14:59Z

This error message...

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

...implies that the Locator Strategy you have used wasn't a valid expression.

To scrape the text -6% Est. Return you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategy:

Using XPATH:

driver.get('https://finance.yahoo.com/quote/IBM')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Near Fair Value']//following::div[1]/div"))).text)

Console Output:
```
-6% Est. Return
```

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Collectives™ on Stack Overflow

Web scraping using Selenium and chromedriver in Python

3 Answers 3

9 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

Comments

Comments

Linked

Related