I have multiple websites and i want to get the "Contact Us" Url for each of the website. The Urls are not necessarily contained in same class for all websites. However, the innerHTML of all the websites essentially contains the word "contact"
Is there a way to extract URL from a webpage, if the innerhtml contains specific word. For example, in case of below HTML, i want to extract the URL if the innerhtml contains the word "contact" ( case insensitive ).
HTML = {
<a class="" style="COLOR: #000000; TEXT-DECORATION: none" href="http://www.candp.com/bin/index.asp?id=565B626C6C6A79504B575A4D626E" target=
"_parent">
<font size="2">
<strong>Contact Us</strong>
</font>
</a>
}
output required :-
'http://www.candp.com/bin/index.asp?id=565B626C6C6A79504B575A4D626E'
I could reach to below code so far, but it doesn't seem to work:-
link=[]
driver.get(main_url)
elements = driver.find_elements_by_xpath("//a").get_attribute('href') # the href is not always contained in a tag
for el in elements:
if 'contact'.casefold() in str(el.text):
link.append(el.get_attribute('href'))
Any help is greatly appreciated,