2

I want to extract a part of the web page source. Now I can extract all html code and output proper code. However, I want to extract only a part of code.

The following is their html code, I want to crawl. I want to crawl only red range:

enter image description here

And then, the following is my python code:

    from datetime import date,datetime
    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from bs4 import BeautifulSoup
    from selenium.webdriver.support.ui import Select
    from selenium.common.exceptions import NoSuchElementException
    import numpy as np
    import xlrd
    import csv
    import codecs
    import time
    import os

        driver_blank=webdriver.Chrome('./chromedriver')
        driver_blank.get('https://forumd.hkgolden.com/view.aspx?type=CA&message=7223327')
        time.sleep(1)
        try_value = 1
        while(try_value):
            try:
                driver_blank.find_element_by_xpath('/html/body/form/div[5]/div/div/div[2]/div[1]/div[5]/table[2]')
                print('OK')
                try_value=0
            except NoSuchElementException as e:
                print('Refreash now')
                driver_blank.refresh()
                time.sleep(10)
        html_code = driver_blank.page_source
        print(html_code)

Can I use full Xpath to locate this range?

2
  • Why don't you use the class to get the grid ? Commented Apr 23, 2020 at 13:52
  • It is because I need to crawl a lot of same class name in the same page Commented Apr 23, 2020 at 14:40

1 Answer 1

3

If you want get the grid html You need to identify the grid element first and then use get_attribute("outerHTML")

Induce WebDriverWait() and wait for visibility_of_element_located()

Code:

driver.get("https://forumd.hkgolden.com/view.aspx?type=CA&message=7223327")
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH,"(//div[@class='ContentGrid'])[1]")))
print(driver.find_element_by_xpath("(//div[@class='ContentGrid'])[1]").get_attribute("outerHTML"))

You need to add following libraries.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
Sign up to request clarification or add additional context in comments.

3 Comments

outerHtml contains also the grid tags. innerHTML is not an official DOM attribute, but supported by most browsers and returns only the content of the element.
@Justlearnedit : correct.I guess OP wants html not the content inside the tag.
Actually, I must use the full xpath. I changed to full path still success to crawl the information. Thank you~~~

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.