HTML parsing text in Python 3

Question

Im trying to get the text from a webpage with Python 3.3 and then search through that text for certain strings. When I find a matching string I need to save the following text. For example I take this page: http://gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy and I need to save the text after each category (card text, rarity, etc) in the card info. Currently Im using beautiful Soup but get_text causes a UnicodeEncodeError and doesnt return an iterable object. Here is the relevant code:

               urlStr = urllib.request.urlopen('http://gatherer.wizards.com/Pages/Card/Details.aspx?name=' + cardName).read()

                htmlRaw = BeautifulSoup(urlStr)

                htmlText = htmlRaw.get_text

                for line in htmlText:
                    line = line.strip()
                    if "Converted Mana Cost:" in line:
                        cmc = line.next()
                        message += "*Converted Mana Cost: " + cmc +"* \n\n"
                    elif "Types:" in line:
                        type = line.next()
                        message += "*Type: " + type +"* \n\n"
                    elif "Card Text:" in line:
                        rulesText = line.next()
                        message += "*Rules Text: " + rulesText +"* \n\n"
                    elif "Flavor Text:" in line:
                        flavor = line.next()
                        message += "*Flavor Text: " + flavor +"* \n\n"
                    elif "Rarity:" in line:
                        rarity == line.next()
                        message += "*Rarity: " + rarity +"* \n\n"

Guy Gavriely · Accepted Answer · 2014-01-27 19:01:21Z

1

consider using lxml and xpath instead, you will then be able to do things like:

>>> from lxml import html
>>> root = html.parse("http://gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy")
>>> root.xpath('//div[contains(text(), "Flavor Text")]/following-sibling::div/div/i/text()')
['When the bog ran short on small animals, Ekri turned to the surrounding farmlands.']

answered Jan 27, 2014 at 19:01

Guy Gavriely

11.4k6 gold badges30 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

CrazyBurrito Over a year ago

How do I install this on windows? The instructions on the website seem to say just to download it but that doesnt work

Collectives™ on Stack Overflow

HTML parsing text in Python 3

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related