0

I am trying to scrape a site https://www.mdoffice.com.ua/ with help of Selenium module (Python). This site require entering a login and a password, for particular information, that's why I can use only Selenium for scraping. After downloading a home page and moving to the next link from this page I am trying to see current url of this page, but program shows a url of a home page and I can not scrape any information from this page (scraping is possible only of the home page). Such situation only on this site on the other sites everything is ok. Examples of code are below. How to solve this problem? Thank you!

Example 1

'''
from selenium import webdriver
import time

browser = webdriver.Chrome("D:\Programs\Chrome dr Selenium\chromedriver_90")
url = "https://www.mdoffice.com.ua/ru/amain.html"
browser.get(url)
time.sleep(3)
elem = browser.find_element_by_link_text("Инструкции MDOffice")
or elem = browser.find_element_by_xpath("/html/body/div[3]/div[2]/div[2]/nav/ul[1]/li/a") -
result is the same
time.sleep(3)
elem.click()
print(browser.current_url)
Result: https://www.mdoffice.com.ua/ru/amain.html
Result which should be: https://www.mdoffice.com.ua/ru/aMDOFAQ.decl
'''

Example 2 (Here everything is fine)

'''

from selenium import webdriver
import time

browser = webdriver.Chrome("D:\Programs\Chrome dr Selenium\chromedriver_90")
url = "https://www.bbc.com/news"
browser.get(url)
time.sleep(3)
link_1 = browser.find_element_by_link_text("Business")
time.sleep(3)
link_1.click()
page_url = browser.current_url
print(page_url)
Result: https://www.bbc.com/news/business
'''

2 Answers 2

2

I think in your Example 1: add a sleep statement after click... so it should go as

browser = webdriver.Chrome("D:\Programs\Chrome dr Selenium\chromedriver_90")
url = "https://www.mdoffice.com.ua/ru/amain.html"
browser.get(url)
time.sleep(3)
elem = browser.find_element_by_link_text("Инструкции MDOffice")
# add below
elem.click()
time.sleep(3)
print(browser.current_url)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, after adding sleep time everything works fine.
1

You need to wait between the moment you click the link and the moment the page is loaded, because the page may take time to load due to various reasons. To wait for the page to be loaded, you can use expected_conditions and WebDriverWait:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

browser = webdriver.Chrome("D:\Programs\Chrome dr Selenium\chromedriver_90")
url = "https://www.mdoffice.com.ua/ru/amain.html"
browser.get(url)
WebDriverWait(browser, 10).until(ec.element_to_be_clickable((By.LINK_TEXT, "Инструкции MDOffice")))
browser.find_element_by_link_text("Инструкции MDOffice").click()

page_loaded = ec.url_to_be("https://www.mdoffice.com.ua/ru/aMDOFAQ.decl")
WebDriverWait(browser, 10).until(page_loaded)

This will wait for up to 10 seconds before clicking the link, and then wait for up to 10 seconds for the page to load. This is generally recommended over using time.sleep, because it makes the piece of code more stable (and faster in the case of the page / elements loading faster than the 3 seconds used in the initial post)

2 Comments

Thank you very much for the answer. Your code works well. Also helped adding time sleep after click.
Happy to help! :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.