-2

I'm trying to scrape data from a site with multiple pages linked via a NEXT button

The successive page URL has no correspondence with the previous page URL as one might assume

(In that case modifying the path would've solved the problem)

This is what I plan to do -

1.Start with an initial URL

2.Extract information

3.Click NEXT

Repeat 2 and 3 n times

Specifically, I wanted to know how to get the new page URL on clicking

This is what I've come up with so far

def startWebDriver():
    global driver
    options = Options()
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(executable_path = '/path/to/driver/chromedriver_linux64/chromedriver',options=options)

#URL of the initial page
driver.get('https://openi.nlm.nih.gov/detailedresult.php?img=CXR1_1_IM-0001-3001&query=&coll=cxr&req=4&npos=1')

time.sleep(4)

#XPATH of the "NEXT" button
element = driver.find_element_by_xpath('//*[@id="imageClassM"]/div/a[2]/img').click()

Any help would be appreciated

4
  • I'm a bit unclear about what you're trying to achieve here. Would this be the correct synopsis: you've opened the URL, located the "NEXT" button on it, and clicked it, and now you'd like to know which URL the page has redirected to? Commented Feb 21, 2019 at 17:47
  • As per your button xpath is should be > button.However I can't see any > button on webpage you have provided.Is it right url are you navigating? Commented Feb 21, 2019 at 18:47
  • The URL I've provided is the right one.. the XPATH is also right .. but when you visit that page(even manually) ... that element is not visible for some reason @Anuj Khandelwal Commented Feb 22, 2019 at 5:29
  • Yes, that's because its CSS style is set to "display: none". When we remove that style property from the console, the button appears, but clicking it does not lead to any new page. Are you sure that button is functional? Commented Feb 22, 2019 at 5:32

3 Answers 3

0

If you would like to get the url of the page you are on after clicking next try this.

print(browser.current_url)

or

print(driver.current_url)
Sign up to request clarification or add additional context in comments.

Comments

0

Perhaps you could try something like this:

from selenium import webdriver
from selenium.webdriver import ChromeOptions
import time

if __name__ == "__main__":
    options = ChromeOptions()
    options.add_argument("--disable-extensions")
    #start driver
    driver = webdriver.Chrome(options=options)
    #load first page
    driver.get('https://openi.nlm.nih.gov/detailedresult.php?img=CXR1_1_IM-0001-3001&query=&coll=cxr&req=4&npos=1')
    for i in range(3): #However many of these links to click
        time.sleep(4) # let each page load
        driver.find_element_by_xpath('//*[@id="imageClassM"]/div/a[2]/img').click()
        print(driver.current_url)

This loads the page for me (I removed your bit about chrome driver path because my driver is in the same folder). It does get an error though, and looks like it's mad at driver.find_element_by_xpath('//*[@id="imageClassM"]/div/a[2]/img').click() saying:

selenium.common.exceptions.ElementNotVisibleException: Message: element not visible

I'm not sure how to fix that because I see no "NEXT" button on the webpage... I'm sure you can figure it out though!

Comments

0
driver.current_url()

You may need to do a wait first for the page to load.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.