0

I am looking at this page. I am trying to use Selenium and chromdriver to scrape this data (shown by the red marker):

enter image description here

Here is my Python code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("disable-infobars")
driver = webdriver.Chrome(executable_path="/ABC/chromedriver", chrome_options=chrome_options)

driver.get("https://finance.yahoo.com/quote/IBM")
sleep(10)
estimated = driver.find_element_by_class_name("IbBox Ta(start) C($tertiaryColor)")

But the code does not get the Est. Return and after a long wait it returns this error message:

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

What am I doing wrong and what is the best and fastest way to get the Est Return value from the page?

UPDATE: Here is what I see if I use inspect element in Chrome:

enter image description here

3 Answers 3

1

Header plays an important role to fetch the value you are after, so make sure you have one. Given that this is how you get the desired content.

import requests
from bs4 import BeautifulSoup

link = "https://finance.yahoo.com/quote/IBM"

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}

r = requests.get(link,headers=headers)
soup = BeautifulSoup(r.text,"lxml")
est_return = soup.select_one("[class='Mb\(8px\)']").get_text()
print(est_return)
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks, it works nicely. How did you find the class? How do you know it should be Mb\(8px\)?
Class name containing braces should be escaped. The backslash ( \ ) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character.
Can you please elaborate, I am not too familiar with this.
For example how can you extract Near Fair Value ?
Try this soup.select_one("[class='Mb\(8px\)']").find_previous_sibling().get_text()
|
0

Can you try with XPath instead, it should look like this:

estimated = driver.find_element_by_xpath("*//div[@class='IbBox Ta(start) C($tertiaryColor)']").text()

Let me know how does it go! :D

Comments

0

This error message...

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

...implies that the Locator Strategy you have used wasn't a valid expression.


To scrape the text -6% Est. Return you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategy:

  • Using XPATH:

    driver.get('https://finance.yahoo.com/quote/IBM')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Near Fair Value']//following::div[1]/div"))).text)
    
  • Console Output:

    -6% Est. Return
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.