How do I extract text under a specific header which starts with a certain set of words

Question

I am trying to scrape text in an H2 tag under a header which starts with "benefits of" ...so it could be like "benefits of toys" or "benefits of cups" etc.

The html code is:

<h2 class="DrugOverview__title___1OwgG">Benefits of Toys</h2>

The code I've used until now is

        benfit = soup.find('h2',text='Benefits of')
        q = benefit.get_text(strip=True)

How do I solve it? Also keep in mind the h2 class cant be used to scrape in this situation(due to other issues).

Subbu VidyaSekar · Accepted Answer · 2021-01-21 08:24:28Z

1

we can use regex to get some specific string

I used strs as input html content

Use the below code:

import re
strs = '<h2 class="DrugOverview__title___1OwgG">Benefits of Toys</h2><h2 class="DrugOverview__title___1OwgG">Benefits of kids</h2>'
soup = BeautifulSoup(strs, 'html.parser')
pattern = re.compile(r'Benefits of')
benefit =  soup.findAll(text = pattern)
print(benefit)

Output:

['Benefits of Toys', 'Benefits of kids']

edited Jan 21, 2021 at 8:24

answered Jan 21, 2021 at 8:14

Subbu VidyaSekar

2,6413 gold badges25 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How do I extract text under a specific header which starts with a certain set of words

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related