4

I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this:

UnicodeEncodeError: 'ascii' codec can't encode character

I traced it back to a trademark superscript on the end of this word: Protection™ -- I do not need to capture the non-ascii stuff, but it is a nuisance and I expect to encounter it more in the future.

Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python?

Thanks! Full error:

E
======================================================================
ERROR: test_untitled (__main__.Untitled)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python26\Test2.py", line 26, in test_untitled
    ofile.write(Test + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 1005: ordinal not in range(128)

Full Script:

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.BaseDomain.com/")
        self.selenium.start()
        self.selenium.set_timeout("90000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            sel.open(row[0])
            time.sleep(10)
            Test = sel.get_text("//html/body/div/table/tbody/tr/td/form/div/table/tbody/tr[7]/td")
            Test = Test.replace(",","")
            Test = Test.replace("\n", "")
            ofile = open('TestOut.csv', 'ab')
            ofile.write(Test + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()
1

1 Answer 1

6

Total repetition of your other question here (though here you finally deign to show us CODE from the start, wow!-). Answer is still identical: instead of

        ofile.write(Test + '\n')

do

        ofile.write(Test.encode('utf8') + '\n')

why do you keep repeating this Q, BTW?!

Sign up to request clarification or add additional context in comments.

1 Comment

I figured this is how I should have asked the question in the first place. Much easier for you to answer clearly. Which you did. Thanks Alex!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.