1

I have multiple websites and i want to get the "Contact Us" Url for each of the website. The Urls are not necessarily contained in same class for all websites. However, the innerHTML of all the websites essentially contains the word "contact"

Is there a way to extract URL from a webpage, if the innerhtml contains specific word. For example, in case of below HTML, i want to extract the URL if the innerhtml contains the word "contact" ( case insensitive ).

HTML = {
<a class="" style="COLOR: #000000; TEXT-DECORATION: none" href="http://www.candp.com/bin/index.asp?id=565B626C6C6A79504B575A4D626E" target=
"_parent">
   <font size="2">
      <strong>Contact Us</strong>
   </font>
</a>
}

output required :-

'http://www.candp.com/bin/index.asp?id=565B626C6C6A79504B575A4D626E'

I could reach to below code so far, but it doesn't seem to work:-

link=[]
driver.get(main_url)
elements = driver.find_elements_by_xpath("//a").get_attribute('href')   #  the href is not always contained in a tag
for el in elements:
    if 'contact'.casefold() in str(el.text):
         link.append(el.get_attribute('href'))

Any help is greatly appreciated,

2 Answers 2

1

Try this:-

r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
links = soup.find_all("a")
link=[]
for link in links:
    if 'contact' in link.text.lower():
          link.append(link.get(a.href))

The output for the url you have mentioned is :-

<a href="http://www.candp.com/bin/index.asp?id=565B626C686E79504B575A4D626E" target="_blank"><font face="Verdana" size="1">Get more details</font></a>
Sign up to request clarification or add additional context in comments.

Comments

1

Try following code:

link=[]
elements = driver.find_elements_by_xpath("//a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') , 'contact')]")
for el in elements:
    link.append(el.get_attribute("href"))

1 Comment

it gives an empty list

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.