0

I am trying to scrape the dropdown menu for selecting a year on https://www.atptour.com/en/rankings/singles with selenium.

The menu is li element, and I need the ranking dates from either the li content or data-value attribute of the li element.

When I run my code below

import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url ="https://www.atptour.com/en/rankings/singles"
driver = webdriver.Chrome('/Users/snvplayer/Downloads/chromedriver')
# driver = webdriver.Chrome()

driver.implicitly_wait(30)
driver.get(url)
# result = driver.find_element_by_id('header')
result = driver.find_element_by_css_selector('ul.dropdown li')
print(result.text)
print(type(result))

I also tried to click on the menu to make the menu list visible and wait, but it returns empty element.

driver.find_element_by_css_selector("div.dropdown-label").click()
driver.implicitly_wait(10)
result = driver.find_element_by_css_selector("ul.dropdown li")
# result = driver.find_element_by_css_selector("ul.dropdown").click()
print(result[0].text)
print(type(result))

Could anyone explain how I can scrape this page? I attached the source code of the page as an image

1
  • By "it returns empty element", do you mean that an error was thrown saying that it wasn't found, or that the text is empty? What is the result of your print statements? Also the first element - that you selected - has display: none, maybe it plays a role. Commented Jan 3, 2021 at 13:44

1 Answer 1

1

You need to get the innerText attribute and, if needed, trim the whitespaces.

This should do the trick:

url ="https://www.atptour.com/en/rankings/singles"
driver.get(url)
wait = WebDriverWait(driver, 10)
# wait for the date range element to be present on the page
# just to make sure that we're not trying to get the child elements before the parent element is actually present
data_range = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-value="rankDate"]')))
list_items = data_range.find_elements_by_tag_name('li')
for res in list_items:
    date_value = str(res.get_attribute('innerText')).strip()
    print(date_value)

Behind the scenes, .text property of an WebElement gets the value attribute. But because a li tag doesn't have a value attribute it will return an empty string (the behaviour you were seeing).

As a note, adding a non-standard value attribute on li tags will mean that the page is invalid. That's the purpose for the data-value attribute.

On the other hand, innerText returns all text contained by an element and all its child elements.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much! Would you mind explaining how this works? Is it because those elements are loaded via javascript, and we have to wait until it's loaded?
@user3562812 updated the answer to provide more details (as comments on the code, although that wasn't the issue in your case) as to why .text doesn't work in this case.
Thank you so much, I really appreciate your help. So the issue was due to the specific property of li element. I will have to keep this in mind. There is so much to learn.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.