31

How do I make selenium click on elements and scrape data before the page has fully loaded? My internet connection is quite terrible so it sometimes takes forever to load the page entirely, is there anyway around this?

1

4 Answers 4

59

Update with (7 July 2023)

page_load_strategy

page_load_strategy is now an attribute. So the minimal code block to configure page_load_strategy with Selenium v 4.6 and above is as follows:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
# options.page_load_strategy = 'none'
options.page_load_strategy = 'eager'
# options.page_load_strategy = 'normal'
driver = webdriver.Chrome(options=options)
driver.get("https://google.com")

ChromeDriver 77.0 (which supports Chrome version 77) now supports eager as pageLoadStrategy.

Resolved issue 1902: Support eager page load strategy [Pri-2]


As you question mentions of click on elements and scrape data before the page has fully loaded in this case we can take help of an attribute pageLoadStrategy. When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. Selenium can start executing the next line of code from different Document readiness state. Currently Selenium supports 3 different Document readiness state which we can configure through the pageLoadStrategy as follows:

  1. none (undefined)
  2. eager (page becomes interactive)
  3. normal (complete page load)

Here is the code block to configure the pageLoadStrategy:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

binary = r'C:\Program Files\Mozilla Firefox\firefox.exe'
caps = DesiredCapabilities().FIREFOX
# caps["pageLoadStrategy"] = "normal"  #  complete
caps["pageLoadStrategy"] = "eager"  #  interactive
# caps["pageLoadStrategy"] = "none"   #  undefined
driver = webdriver.Firefox(capabilities=caps, firefox_binary=binary, executable_path="C:\\Utility\\BrowserDrivers\\geckodriver.exe")
driver.get("https://google.com")
Sign up to request clarification or add additional context in comments.

5 Comments

Awesome! Is it possible to implement using Chrome as a browser?
@nonein Using the DesiredCapabilities you can implement it using any browser Chrome, IE, Safari, Edge etc. Please Accept the Answer if it catered to your Question.
Do I simply just add capabilities=caps to chrome as well? or do I use the argument function?
is it also possible to use a strategy where it fully waits for the page to load?
What's the difference between using eager vs getting driver.page_source after the timeout exception ?
15

Update 2022: This behavior is now supported in Chromedriver. Please see @undetected's answer. The answer below is still relevant if you have to use the 'none' pageLoadStrategy (e.g. if you don't want to wait for the page to become interactive).

Old answer:

We can use the 'none' pageLoadStrategy and implement a custom wait function to wait untill our specific element is interactable.

Add the pageLoadStrategy to your desired capabilities when initializing the chromedriver:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

caps = DesiredCapabilities().CHROME
# caps["pageLoadStrategy"] = "normal"  #  Waits for full page load
# caps["pageLoadStrategy"] = "eager"  #  Waits for page to be interactive
caps["pageLoadStrategy"] = "none"   # Do not wait for full page load
driver = webdriver.Chrome(desired_capabilities=caps, executable_path="path/to/chromedriver.exe")

Note that when using the 'none' strategy you most likely have to implement your own wait method to check if the element you need is loaded.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

WebDriverWait(driver, timeout=10).until(
    ec.visibility_of_element_located((By.ID, "your_element_id"))
)

Now you can start interacting with your element before the page is fully loaded!

2 Comments

it still works!
Now, it's supported. use @undetected Selenium answer
1

SAME AS ABOVE for those that use chrome.. USED "EAGER" IN CAPS. WORKS PERFECT. Sped up my time greatly.

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

caps = DesiredCapabilities().CHROME
# caps["pageLoadStrategy"] = "normal"  #  Waits for full page load
caps["pageLoadStrategy"] = "eager"   # Do not wait for full page load
driver = webdriver.Chrome(desired_capabilities=caps, executable_path="path/to/chromedriver.exe")

Comments

0

UPDATE (since selenium v4.10)

pageLoadStrategy is now added to options. There are 3 modes:

  1. none (undefined: you have to set wait (and timeout) for each command)
  2. eager (page becomes interactive but some resources not loaded)
  3. normal (Default: complete page load)

In your case you should use eager or none

options = webdriver.ChromeOptions()
options.set_capability('pageLoadStrategy', "eager")
driver = webdriver.Chrome(options=options)
driver.get('https://example.com/')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.