1

Im using selenium to extract data from a web page. Im trying to write this data into a file, but i have some trouble doing so, when i write special char like 'é' it create unreadable char in my file(é). The website im getting the page from is encoded in iso-8859-1 and im using python 2.7.

browser = webdriver.Firefox()  
browser.get(URL_SITE_ENCODED_IN_iso-8859-1)
html = browser.page_source.decode('iso-8859-1') //error

From what i understood i have to decode the page from iso-8859-1 then it will encode it in utf-8, but when i try to an error is raised : UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 246: ordinal not in range(128)

2
  • Please provide the code that generates error, as described by: stackoverflow.com/help/mcve Commented Dec 17, 2015 at 17:37
  • Sorry i was busy yesterday night, i will edit that ^^ Commented Dec 18, 2015 at 8:34

1 Answer 1

4

It's probably because browser.page_source.decode is already decoded Unicode. Check with:

>>> type(browser.page_source.decode)
<type 'unicode'>

When you write this to a file, you need to convert it to an appropriate encoding. In Python 2.x, use io module to create an automatic encoding file wrapper. Try:

browser = webdriver.Firefox()  
browser.get(anysite)

with io.open("myoutfile.txt", "w", encoding="utf-8") as my_file:
    my_file.write(browser.page_source)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.