UTF Encoding in selenium webdriver

Question

I currently have the following:

from selenium import webdriver
d = webdriver.Chrome()
# request the url and get the page contents
title = result.find("span", {"class": "episode"}).find("a").text

However, the 'text' that is returned to me is:

# Note the truncation on the word "envol"
<td class="title"><a href="/title/tt1844708/">La grande envol</a></td>

However, when I download the page source, it shows the following:

<td class="title"><a href="/title/tt1844708/">La grande envolée</a>
    <span class="year_type">(1927)</span><br />
</td>

Why is the text truncated in the webdriver response? How would I ensure it gives me the full utf-8 encoded text?

alecxe · Accepted Answer · 2015-02-08 04:00:50Z

1

As far as I understand, you are passing the page_source contents to BeautifulSoup for further parsing.

I would not do that since selenium itself can handle the parsing part pretty well. For example, you can use CSS selectors:

driver.find_element_by_css_selector('span.episode a').text

Example (using this IMDb page):

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('http://www.imdb.com/title/tt1844708/')
>>> print(driver.find_element_by_xpath('//span[@itemprop="name"]').text)
La grande envolée

edited Feb 8, 2015 at 4:00

answered Feb 8, 2015 at 3:54

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

UTF Encoding in selenium webdriver

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related