1

Web URL: https://www.ipsos.com/en-us/knowledge/society/covid19-research-in-uncertain-times

I want to parse the HTML as below:

enter image description here

I want to get all hrefs within the < li > elements and the highlighted text. I tried the code

elementList = driver.find_element_by_class_name('block-wysiwyg').find_elements_by_tag_name("li")
for i in range(len(elementList)):
    driver.find_element_by_class_name('blcokwysiwyg').find_elements_by_tag_name("li").get_attribute("href")

But the block returned none.

Can anyone please help me with the above code?

2 Answers 2

2

I suppose it will fetch you the required content.

import requests
from bs4 import BeautifulSoup

link = 'https://www.ipsos.com/en-us/knowledge/society/covid19-research-in-uncertain-times'

r = requests.get(link)
soup = BeautifulSoup(r.text,"html.parser")
for item in soup.select(".block-wysiwyg li"):
    item_text = item.get_text(strip=True)
    item_link = item.select_one("a[href]").get("href")
    print(item_text,item_link)
Sign up to request clarification or add additional context in comments.

4 Comments

Hi @SIM, I don't know why the bs4 does not work on my end and the block returns SSLError: HTTPSConnectionPool(host='www.ipsos.com', port=443): Max retries exceeded with url:
I ran and found success just now. If you still encounter the same error, try defining a headers like requests.get(link,headers={"User-Agent":"Mozilla/5.0"}).
I figured it out the bs4: requests.get(link, verify=False) it will work
Thanks for all the help!
1

Try is this way:

coronas = driver.find_element_by_xpath("//div[@class='block-wysiwyg']/ul/li")
hr = coronas.find_element_by_xpath('./a')
print(coronas.text)
print(hr.get_attribute('href'))

Output:

The coronavirus is touching the lives of all Americans, but race, age, and income play a big role in the exact ways the virus — and the stalled economy — are affecting people. Here's what that means.
https://www.ipsos.com/en-us/america-under-coronavirus

2 Comments

The block returns error NoSuchElementException: Message: no such element: Unable to locate element...I suspect something wrong with the path?
@Bangbangbang No, the xpath works (just tried it again). Can't explain why it doesn't work on your side.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.