Return to Question

added 37 characters in body

Source Link

edited Jan 27, 2014 at 18:47

Im trying to get the text from a webpage with Python 3.3 and then search through that text for certain strings. When I find a matching string I need to save the following text. For example I take this page: http://gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy and I need to save the text after each category (card text, rarity, etc) in the card info. Currently Im using beautiful Soup but get_text causes a UnicodeEncodeError and doesnt return an iterable object. Here is the relevant code:

Source Link

asked Jan 27, 2014 at 18:39

CrazyBurrito

HTML parsing text in Python 3

               urlStr = urllib.request.urlopen('http://gatherer.wizards.com/Pages/Card/Details.aspx?name=' + cardName).read()

                htmlRaw = BeautifulSoup(urlStr)

                htmlText = htmlRaw.get_text
               
                for line in htmlText:
                    line = line.strip()
                    if "Converted Mana Cost:" in line:
                        cmc = line.next()
                        message += "*Converted Mana Cost: " + cmc +"* \n\n"
                    elif "Types:" in line:
                        type = line.next()
                        message += "*Type: " + type +"* \n\n"
                    elif "Card Text:" in line:
                        rulesText = line.next()
                        message += "*Rules Text: " + rulesText +"* \n\n"
                    elif "Flavor Text:" in line:
                        flavor = line.next()
                        message += "*Flavor Text: " + flavor +"* \n\n"
                    elif "Rarity:" in line:
                        rarity == line.next()
                        message += "*Rarity: " + rarity +"* \n\n"

Collectives™ on Stack Overflow

Return to Question

HTML parsing text in Python 3